A new evolutionary rough fuzzy integrated machine learning technique for microRNA selection using next-generation sequencing data of breast cancer

MicroRNAs (miRNA) play an important role in various biological process by regulating gene expression. Their abnormal expression may lead to cancer. Therefore, analysis of such data may discover potential biological insight for cancer diagnosis. In this regard, recently many feature selection methods have been developed to identify such miRNAs. These methods have their own merits and demerits as the task is very challenging in nature. Thus, in this article, we propose a novel wrapper based feature selection technique with the integration of Rough and Fuzzy sets, Random Forest and Particle Swarm Optimization, to identify putative miRNAs that can solve the underlying biological problem effectively, i.e. to separate tumour and control samples. Here, Rough and Fuzzy sets help to address the vagueness and overlapping characteristics of the dataset while performing clustering. On the other hand, Random Forest is applied to perform the classification task on the clustering results to yield better solutions. The integrated clustering and classification tasks are considered as an underlying optimization problem for Particle Swarm Optimization method where particles encode features, in this case, miRNAs. The performance of the proposed wrapper based method has been demonstrated quantitatively and visually on next-generation sequencing data of breast cancer from The Cancer Genome Atlas (TCGA). Finally, the selected miRNAs are validated through biological significance tests. The code and dataset used in this paper are available online1.

[1]  C. Croce Causes and consequences of microRNA dysregulation in cancer , 2009, Nature Reviews Genetics.

[2]  Pat Langley,et al.  Estimating Continuous Distributions in Bayesian Classifiers , 1995, UAI.

[3]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[4]  James C. Bezdek,et al.  Pattern Recognition with Fuzzy Objective Function Algorithms , 1981, Advanced Applications in Pattern Recognition.

[5]  Anders Krogh,et al.  Signatures of RNA binding proteins globally coupled to effective microRNA target sites. , 2010, Genome research.

[6]  Naimei Tang,et al.  Akt, FoxO and regulation of apoptosis. , 2011, Biochimica et biophysica acta.

[7]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[8]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[9]  E. Polley,et al.  New insights on PI3K/AKT pathway alterations and clinical outcomes in breast cancer. , 2016, Cancer treatment reviews.

[10]  Pradipta Maji,et al.  Rough-Fuzzy Clustering for Grouping Functionally Similar Genes from Microarray Data , 2013, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[11]  Riccardo Poli,et al.  Particle swarm optimization , 1995, Swarm Intelligence.

[12]  Hiroyuki Ogata,et al.  KEGG: Kyoto Encyclopedia of Genes and Genomes , 1999, Nucleic Acids Res..

[13]  Artemis G. Hatzigeorgiou,et al.  DIANA-miRPath v3.0: deciphering microRNA function with experimental support , 2015, Nucleic Acids Res..

[14]  Jun Cheng,et al.  Microarray analysis of MicroRNA expression in peripheral blood mononuclear cells of critically ill patients with influenza A (H1N1) , 2013, BMC Infectious Diseases.

[15]  Janusz Zalewski,et al.  Rough sets: Theoretical aspects of reasoning about data , 1996 .

[16]  Seongjoon Koo,et al.  Development of a micro-array to detect human and mouse microRNAs and characterization of expression in human organs. , 2004, Nucleic acids research.

[17]  D. Bartel MicroRNAs: Target Recognition and Regulatory Functions , 2009, Cell.

[18]  Lykke Pedersen,et al.  Independent component and pathway-based analysis of miRNA-regulated gene expression in a model of type 1 diabetes , 2011, BMC Genomics.

[19]  A. Bradley,et al.  Identification of mammalian microRNA host genes and transcription units. , 2004, Genome research.

[20]  N. Altman An Introduction to Kernel and Nearest-Neighbor Nonparametric Regression , 1992 .

[21]  L. Lim,et al.  MicroRNA targeting specificity in mammals: determinants beyond seed pairing. , 2007, Molecular cell.

[22]  Yan Leng,et al.  Mood stabilizer-regulated miRNAs in neuropsychiatric and neurodegenerative diseases: identifying associations and functions. , 2013, American journal of translational research.

[23]  Damian Szklarczyk,et al.  The STRING database in 2017: quality-controlled protein–protein association networks, made broadly accessible , 2016, Nucleic Acids Res..

[24]  Richard Weber,et al.  Soft clustering - Fuzzy and rough approaches and their extensions and derivatives , 2013, Int. J. Approx. Reason..

[25]  Andrew D. Rouillard,et al.  Enrichr: a comprehensive gene set enrichment analysis web server 2016 update , 2016, Nucleic Acids Res..

[26]  D. Bartel,et al.  Microarray profiling of microRNAs reveals frequent coexpression with neighboring miRNAs and host genes. , 2005, RNA.

[27]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.