Analysis of Next-Generation Sequencing Data of miRNA for the Prediction of Breast Cancer

Recently, Next-Generation Sequencing (NGS) has emerged as revolutionary technique in the fields of ‘-omics’ research. The Cancer Research Atlas (TCGA) is a great example of it where massive amount of sequencing data is present for miRNA and mRNA. Analysing these data could bring out some potential biological insight. Moreover, developing a prognostic system based on this newly available sequencing data will give a greater help to cancer diagnosis. Hence, in this article, we have made an attempt to analyse such sequencing data of miRNA for accurate prediction of Breast Cancer. Generally miRNAs are small non-coding RNAs which are shown to participate in several carcinogenic processes either by tumor suppressors or oncogenes. This is the reason clinical treatment of the breast cancer patient has changed nowadays. Thus, it is interesting to understand the role of miRNAs for the prediction of breast cancer. In this regard, we have developed a technique using Gravitation Search Algorithm, which optimizes the underlying classification performance of Support Vector Machine. The proposed technique is able to select the potential features, in this case miRNAs, in order to achieve better prediction accuracy. In this study, we have achieved the classification accuracy upto 95.29 % by considering \({\simeq }\)1.5 % miRNAs of whole dataset automatically. Thereafter, a list of miRNAs is created after providing a rank. It is found from the list of top 15 miRNAs that 6 miRNAs are associated with the breast cancer while in others, 5 miRNAs are associated with different cancer types and 4 are unknown miRNAs. The performance of the proposed technique is compared with seven other state-of-the-art techniques. Finally, the results have been justified by the means of statistical test along with biological significance analysis of selected miRNAs.

[1]  Ujjwal Maulik,et al.  Improved differential evolution for microarray analysis , 2012, Int. J. Data Min. Bioinform..

[2]  D. Bartel MicroRNAs: Target Recognition and Regulatory Functions , 2009, Cell.

[3]  Di Wu,et al.  miRCancer: a microRNA-cancer association database constructed by text mining on literature , 2013, Bioinform..

[4]  Roberto Battiti,et al.  Using mutual information for selecting features in supervised neural net learning , 1994, IEEE Trans. Neural Networks.

[5]  Hossein Nezamabadi-pour,et al.  GSA: A Gravitational Search Algorithm , 2009, Inf. Sci..

[6]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[7]  Jun Cheng,et al.  Microarray analysis of MicroRNA expression in peripheral blood mononuclear cells of critically ill patients with influenza A (H1N1) , 2013, BMC Infectious Diseases.

[8]  L. Lim,et al.  MicroRNA targeting specificity in mammals: determinants beyond seed pairing. , 2007, Molecular cell.

[9]  C. Sander,et al.  Analysis of microRNA-target interactions across diverse cancer types , 2013, Nature Structural &Molecular Biology.

[10]  Lykke Pedersen,et al.  Independent component and pathway-based analysis of miRNA-regulated gene expression in a model of type 1 diabetes , 2011, BMC Genomics.

[11]  D. Bartel,et al.  Microarray profiling of microRNAs reveals frequent coexpression with neighboring miRNAs and host genes. , 2005, RNA.

[12]  Bernhard E. Boser,et al.  A training algorithm for optimal margin classifiers , 1992, COLT '92.

[13]  Ujjwal Maulik,et al.  Application of RotaSVM for HLA Class II Protein-Peptide Interaction Prediction , 2014, BIOINFORMATICS.

[14]  A. Bradley,et al.  Identification of mammalian microRNA host genes and transcription units. , 2004, Genome research.

[15]  Ujjwal Maulik,et al.  Ensemble learning prediction of protein-protein interactions using proteins functional annotations. , 2014, Molecular bioSystems.

[16]  Edward B. Escott Cubic Congruences with Three Real Roots , 1910 .

[17]  Seongjoon Koo,et al.  Development of a micro-array to detect human and mouse microRNAs and characterization of expression in human organs. , 2004, Nucleic acids research.

[18]  Fuhui Long,et al.  Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy , 2003, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  Ujjwal Maulik,et al.  MaER: A New Ensemble Based Multiclass Classifier for Binding Activity Prediction of HLA Class II Proteins , 2015, PReMI.

[20]  Ayman Grada,et al.  Next-generation sequencing: methodology and application. , 2013, The Journal of investigative dermatology.

[21]  Ujjwal Maulik,et al.  Binding Activity Prediction of Cyclin-Dependent Inhibitors , 2015, J. Chem. Inf. Model..

[22]  Ujjwal Maulik,et al.  A new multi-objective technique for differential fuzzy clustering , 2011, Appl. Soft Comput..

[23]  Douglas A. Wolfe,et al.  Nonparametric Statistical Methods , 1973 .

[24]  Yan Leng,et al.  Mood stabilizer-regulated miRNAs in neuropsychiatric and neurodegenerative diseases: identifying associations and functions. , 2013, American journal of translational research.

[25]  P. Bickel,et al.  Mathematical Statistics: Basic Ideas and Selected Topics , 1977 .

[26]  M. Friedman A Comparison of Alternative Tests of Significance for the Problem of $m$ Rankings , 1940 .

[27]  Tyler E. Miller,et al.  MicroRNA-221/222 Confers Tamoxifen Resistance in Breast Cancer by Targeting p27Kip1*♦ , 2008, Journal of Biological Chemistry.

[28]  Indrajit Saha,et al.  A new evolutionary gene selection technique , 2015, 2015 IEEE Congress on Evolutionary Computation (CEC).