Identification of miRNA Biomarkers for Diverse Cancer Types Using Statistical Learning Methods at the Whole-Genome Scale

Genome-wide analysis of miRNA molecules can reveal important information for understanding the biology of cancer. Typically, miRNAs are used as features in statistical learning methods in order to train learning models to predict cancer. This motivates us to propose a method that integrates clustering and classification techniques for diverse cancer types with survival analysis via regression to identify miRNAs that can potentially play a crucial role in the prediction of different types of tumors. Our method has two parts. The first part is a feature selection procedure, called the stochastic covariance evolutionary strategy with forward selection (SCES-FS), which is developed by integrating stochastic neighbor embedding (SNE), the covariance matrix adaptation evolutionary strategy (CMA-ES), and classifiers, with the primary objective of selecting biomarkers. SNE is used to reorder the features by performing an implicit clustering with highly correlated neighboring features. A subset of features is selected heuristically to perform multi-class classification for diverse cancer types. In the second part of our method, the most important features identified in the first part are used to perform survival analysis via Cox regression, primarily to examine the effectiveness of the selected features. For this purpose, we have analyzed next generation sequencing data from The Cancer Genome Atlas in form of miRNA expression of 1,707 samples of 10 different cancer types and 333 normal samples. The SCES-FS method is compared with well-known feature selection methods and it is found to perform better in multi-class classification for the 17 selected miRNAs, achieving an accuracy of 96%. Moreover, the biological significance of the selected miRNAs is demonstrated with the help of network analysis, expression analysis using hierarchical clustering, KEGG pathway analysis, GO enrichment analysis, and protein-protein interaction analysis. Overall, the results indicate that the 17 selected miRNAs are associated with many key cancer regulators, such as MYC, VEGFA, AKT1, CDKN1A, RHOA, and PTEN, through their targets. Therefore the selected miRNAs can be regarded as putative biomarkers for 10 types of cancer.

[1]  Alioune Ngom,et al.  A review on machine learning principles for multi-view biological data integration , 2016, Briefings Bioinform..

[2]  Jason Weston,et al.  Gene Selection for Cancer Classification using Support Vector Machines , 2002, Machine Learning.

[3]  O. Dahl,et al.  Identification of a sixteen-microRNA signature as prognostic biomarker for stage II and III colon cancer , 2017, Oncotarget.

[4]  Supriyo Chakraborty,et al.  Interplay between miRNAs and human diseases , 2018, Journal of cellular physiology.

[5]  George A Calin,et al.  Key principles of miRNA involvement in human diseases , 2014, Discoveries.

[6]  Nikolaus Hansen,et al.  Adapting arbitrary normal mutation distributions in evolution strategies: the covariance matrix adaptation , 1996, Proceedings of IEEE International Conference on Evolutionary Computation.

[7]  A. Wayne Whitney,et al.  A Direct Method of Nonparametric Measurement Selection , 1971, IEEE Transactions on Computers.

[8]  H. Horvitz,et al.  MicroRNA expression profiles classify human cancers , 2005, Nature.

[9]  Daniel J. Gaffney,et al.  A survey of best practices for RNA-seq data analysis , 2016, Genome Biology.

[10]  Liang Liang,et al.  Prognostic microRNAs and their potential molecular mechanism in pancreatic cancer: A study based on The Cancer Genome Atlas and bioinformatics investigation , 2017, Molecular medicine reports.

[11]  Artemis G. Hatzigeorgiou,et al.  DIANA-miRPath v3.0: deciphering microRNA function with experimental support , 2015, Nucleic Acids Res..

[12]  Xing Li,et al.  High expression of microRNA-183/182/96 cluster as a prognostic biomarker for breast cancer , 2016, Scientific Reports.

[13]  J. R. Quinlan Induction of decision trees , 2004, Machine Learning.

[14]  Petros Koumoutsakos,et al.  Reducing the Time Complexity of the Derandomized Evolution Strategy with Covariance Matrix Adaptation (CMA-ES) , 2003, Evolutionary Computation.

[15]  U. Maulik,et al.  An SVM-Wrapped Multiobjective Evolutionary Feature Selection Approach for Identifying Cancer-MicroRNA Markers , 2013, IEEE Transactions on NanoBioscience.

[16]  D. Bartel MicroRNAs: Target Recognition and Regulatory Functions , 2009, Cell.

[17]  E. Lin,et al.  Machine learning and systems genomics approaches for multi-omics data , 2017, Biomarker Research.

[18]  Susumu Goto,et al.  KEGG: Kyoto Encyclopedia of Genes and Genomes , 2000, Nucleic Acids Res..

[19]  Junpeng Zhang,et al.  Identifying direct miRNA-mRNA causal regulatory relationships in heterogeneous data , 2014, J. Biomed. Informatics.

[20]  Taesung Park,et al.  Cancer survival classification using integrated data sets and intermediate information , 2014, Artif. Intell. Medicine.

[21]  C. Robson,et al.  Deubiquitinating enzymes as oncotargets , 2015, Oncotarget.

[22]  Aleks Jakulin Machine Learning Based on Attribute Interactions , 2005 .

[23]  F. Fleuret Fast Binary Feature Selection with Conditional Mutual Information , 2004, J. Mach. Learn. Res..

[24]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[25]  O. Rath,et al.  MAP kinase signalling pathways in cancer , 2007, Oncogene.

[26]  Wenbin Chen,et al.  Identifying miRNA/mRNA negative regulation pairs in colorectal cancer , 2015, Scientific Reports.

[27]  Ali Anaissi,et al.  Ensemble Feature Learning of Genomic Data Using Support Vector Machine , 2016, PloS one.

[28]  Damian Szklarczyk,et al.  STRING v11: protein–protein association networks with increased coverage, supporting functional discovery in genome-wide experimental datasets , 2018, Nucleic Acids Res..

[29]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[30]  Bert Vogelstein,et al.  Cell-cycle arrest versus cell death in cancer therapy , 1997, Nature Medicine.

[31]  Olivier Debeir,et al.  Limiting the Number of Trees in Random Forests , 2001, Multiple Classifier Systems.

[32]  Habibollah Haron,et al.  Supervised, Unsupervised, and Semi-Supervised Feature Selection: A Review on Gene Selection , 2016, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[33]  Rossitza Setchi,et al.  Feature selection using Joint Mutual Information Maximisation , 2015, Expert Syst. Appl..

[34]  Gary D. Bader,et al.  The mutational landscape of phosphorylation signaling in cancer , 2013, Scientific Reports.

[35]  N. Breslow Covariance analysis of censored survival data. , 1974, Biometrics.

[36]  Geoffrey E. Hinton,et al.  Stochastic Neighbor Embedding , 2002, NIPS.

[37]  Gavin Brown,et al.  Conditional Likelihood Maximisation: A Unifying Framework for Information Theoretic Feature Selection , 2012, J. Mach. Learn. Res..

[38]  Russ B. Altman,et al.  Nonparametric methods for identifying differentially expressed genes in microarray data , 2002, Bioinform..

[39]  Yong Peng,et al.  The role of MicroRNAs in human cancer , 2016, Signal Transduction and Targeted Therapy.

[40]  Peter W. Laird,et al.  Cell-of-Origin Patterns Dominate the Molecular Classification of 10,000 Tumors from 33 Types of Cancer , 2018, Cell.

[41]  Hsien-Da Huang,et al.  Integrated MicroRNA–mRNA Analysis Reveals miR-204 Inhibits Cell Proliferation in Gastric Cancer by Targeting CKS1B, CXCL1 and GPRC5A , 2017, International journal of molecular sciences.

[42]  Most Mauluda Akhtar,et al.  Bioinformatic tools for microRNA dissection , 2015, Nucleic acids research.

[43]  Vladimir Vapnik,et al.  Support-vector networks , 2004, Machine Learning.

[44]  Keun Ho Ryu,et al.  A New Direction of Cancer Classification: Positive Effect of Low-Ranking MicroRNAs , 2014, Osong public health and research perspectives.

[45]  Shubhra Sankar Ray,et al.  Noncoding RNAs and their annotation using metagenomics algorithms , 2015, WIREs Data Mining Knowl. Discov..

[46]  Silvia Bottini,et al.  Viruses and miRNAs: More Friends than Foes , 2017, Front. Microbiol..

[47]  Andrew D. Rouillard,et al.  Enrichr: a comprehensive gene set enrichment analysis web server 2016 update , 2016, Nucleic Acids Res..

[48]  C. Sander,et al.  Analysis of microRNA-target interactions across diverse cancer types , 2013, Nature Structural &Molecular Biology.

[49]  N. Altman An Introduction to Kernel and Nearest-Neighbor Nonparametric Regression , 1992 .

[50]  Olivier Gevaert,et al.  MicroRNA based Pan-Cancer Diagnosis and Treatment Recommendation , 2016, BMC Bioinformatics.

[51]  Liangbiao Chen,et al.  Multi-class cancer classification through gene expression profiles: microRNA versus mRNA. , 2009, Journal of genetics and genomics = Yi chuan xue bao.

[52]  Meng Wang,et al.  Classification of cancers based on copy number variation landscapes. , 2016, Biochimica et biophysica acta.

[53]  Fei Wang,et al.  miRTarBase 2020: updates to the experimentally validated microRNA–target interaction database , 2019, Nucleic Acids Res..

[54]  Carl Tim Kelley,et al.  Iterative methods for optimization , 1999, Frontiers in applied mathematics.

[55]  Xiaowei Wang,et al.  OncomiR: an online resource for exploring pan-cancer microRNA dysregulation , 2018, Bioinform..

[56]  Wei He,et al.  KAT5 and KAT6B are in positive regulation on cell proliferation of prostate cancer through PI3K-AKT signaling. , 2013, International journal of clinical and experimental pathology.

[57]  Nikolaus Hansen,et al.  The CMA Evolution Strategy: A Tutorial , 2016, ArXiv.

[58]  Tao Huang,et al.  Screening Dys-Methylation Genes and Rules for Cancer Diagnosis by Using the Pan-Cancer Study , 2020, IEEE Access.

[59]  Raymond Ros,et al.  A Simple Modification in CMA-ES Achieving Linear Time and Space Complexity , 2008, PPSN.

[60]  Debahuti Mishra,et al.  Feature Selection for Cancer Classification: A Signal-to-noise Ratio Approach , 2011 .

[61]  Yusuke Yamamoto,et al.  A combination of circulating miRNAs for the early detection of ovarian cancer , 2017, Oncotarget.

[62]  P. Pavlidis,et al.  Predictability of human differential gene expression , 2019, Proceedings of the National Academy of Sciences.

[63]  Lei Chen,et al.  Classifying Ten Types of Major Cancers Based on Reverse Phase Protein Array Profiles , 2015, PloS one.

[64]  Juliana Costa-Silva,et al.  RNA-Seq differential expression analysis: An extended review and a software tool , 2017, PloS one.

[65]  G. Hannon,et al.  Control of translation and mRNA degradation by miRNAs and siRNAs. , 2006, Genes & development.

[66]  Dan Wang,et al.  Identifying miRNA-mRNA regulation network of chronic pancreatitis based on the significant functional expression , 2017, Medicine.

[67]  Burton B. Yang,et al.  MicroRNA-in drug resistance , 2014, Oncoscience.

[68]  Behzad Baradaran,et al.  Treating cancer with microRNA replacement therapy: A literature review , 2018, Journal of cellular physiology.

[69]  C. Thermes,et al.  Ten years of next-generation sequencing technology. , 2014, Trends in genetics : TIG.

[70]  Sriparna Saha,et al.  A Stack-based Ensemble Framework for Detecting Cancer MicroRNA Biomarkers , 2017, Genom. Proteom. Bioinform..

[71]  Pat Langley,et al.  Estimating Continuous Distributions in Bayesian Classifiers , 1995, UAI.

[72]  Fuhui Long,et al.  Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy , 2003, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[73]  Yang Yang,et al.  A clustering-based approach for efficient identification of microRNA combinatorial biomarkers , 2017, BMC Genomics.

[74]  Jianqing Fan,et al.  Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties , 2001 .

[75]  Nikolaus Hansen,et al.  The CMA Evolution Strategy: A Comparing Review , 2006, Towards a New Evolutionary Computation.

[76]  J Ma,et al.  MicroRNA and drug resistance , 2010, Cancer Gene Therapy.

[77]  José Augusto Baranauskas,et al.  How Many Trees in a Random Forest? , 2012, MLDM.

[78]  D.,et al.  Regression Models and Life-Tables , 2022 .