Systematic Comparison of Machine Learning Methods for Identification of miRNA Species as Disease Biomarkers

Micro RNA (miRNA) plays important roles in a variety of biological processes and can act as disease biomarkers. Thus, establishment of discovery methods to detect disease-related miRNAs is warranted. Human omics data including miRNA expression profiles have orders of magnitude with much more number of descriptors (p) than that of samples (n), which is so called “p > > n problem”. Since traditional statistical methods mislead to localized solutions, application of machine learning (ML) methods that handle sparse selection of the variables are expected to solve this problem. Among many ML methods, least absolute shrinkage and selection operator (LASSO) and multivariate adaptive regression splines (MARS) give a few variables from the result of supervised learning with endpoints such as human disease statuses. Here, we performed systematic comparison of LASSO and MARS to discover biomarkers, using six miRNA expression data sets of human disease samples, which were obtained from NCBI Gene Expression Omnibus (GEO). We additionally conducted partial least square method discriminant analysis (PLS-DA), as a control traditional method to evaluate baseline performance of discriminant methods. We observed that LASSO and MARS showed relatively higher performance compared to that of PLS-DA, as the number of the samples increases. Also, some of the identified miRNA species by ML methods have already been reported as candidate disease biomarkers in the previous biological studies. These findings should contribute to the extension of our knowledge on ML method performances in empirical utilization of clinical data.

[1]  P. Sarnow,et al.  Modulation of Hepatitis C Virus RNA Abundance by a Liver-Specific MicroRNA , 2005, Science.

[2]  T. Blondal,et al.  Efficient identification of miRNAs for classification of tumor origin. , 2014, The Journal of molecular diagnostics : JMD.

[3]  S. Lemon,et al.  Regulation of Hepatitis C Virus Translation and Infectious Virus Production by the MicroRNA miR-122 , 2010, Journal of Virology.

[4]  A. Baranova,et al.  Differential expression of miRNAs in the visceral adipose tissue of patients with non‐alcoholic fatty liver disease , 2010, Alimentary pharmacology & therapeutics.

[5]  Hui Zhang,et al.  Comparisons of isomiR patterns and classification performance using the rank-based MANOVA and 10-fold cross-validation. , 2015, Gene.

[6]  A. van den Berg,et al.  Comprehensive analysis of miRNA expression in T-cell subsets of rheumatoid arthritis patients reveals defined signatures of naive and memory Tregs , 2014, Genes and Immunity.

[7]  Olga Kovalchuk,et al.  Involvement of microRNA-451 in resistance of the MCF-7 breast cancer cells to chemotherapeutic drug doxorubicin , 2008, Molecular Cancer Therapeutics.

[8]  Gary Ruvkun,et al.  Glimpses of a Tiny RNA World , 2001, Science.

[9]  Yoshiki Murakami,et al.  Universal disease biomarker: can a fixed set of blood microRNAs diagnose multiple diseases? , 2014, BMC Research Notes.

[10]  F. Slack,et al.  OncomiR addiction in an in vivo model of microRNA-21-induced pre-B-cell lymphoma , 2010, Nature.

[11]  Jun Chen,et al.  miR-181b as a potential molecular target for anticancer therapy of gastric neoplasms. , 2012, Asian Pacific journal of cancer prevention : APJCP.

[12]  Trevor Hastie,et al.  Regularization Paths for Generalized Linear Models via Coordinate Descent. , 2010, Journal of statistical software.

[13]  Max Kuhn,et al.  caret: Classification and Regression Training , 2015 .

[14]  R. Tibshirani,et al.  Efficient quadratic regularization for expression arrays. , 2004, Biostatistics.

[15]  Margaret S. Ebert,et al.  Roles for MicroRNAs in Conferring Robustness to Biological Processes , 2012, Cell.

[16]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[17]  Stefan H. E. Kaufmann,et al.  Common patterns and disease-related signatures in tuberculosis and sarcoidosis , 2012, Proceedings of the National Academy of Sciences.

[18]  Qian Gao,et al.  Comparative miRNA Expression Profiles in Individuals with Latent and Active Tuberculosis , 2011, PloS one.

[19]  G. Ruvkun,et al.  A uniform system for microRNA annotation. , 2003, RNA.

[20]  J. Friedman Multivariate adaptive regression splines , 1990 .

[21]  Ryan M. O’Connell,et al.  MicroRNA-155 promotes autoimmune inflammation by enhancing inflammatory T cell development. , 2010, Immunity.

[22]  Yoshiki Murakami,et al.  Comprehensive miRNA Expression Analysis in Peripheral Blood Can Diagnose Liver Disease , 2012, PloS one.

[23]  Gabriel Rinaldi,et al.  Distinct miRNA signatures associate with subtypes of cholangiocarcinoma from infection with the tumourigenic liver fluke Opisthorchis viverrini. , 2014, Journal of hepatology.

[24]  Ana Kozomara,et al.  miRBase: annotating high confidence microRNAs using deep sequencing data , 2013, Nucleic Acids Res..

[25]  Lang Li,et al.  Relationship between Differential Hepatic microRNA Expression and Decreased Hepatic Cytochrome P450 3A Activity in Cirrhosis , 2013, PloS one.

[26]  A. Nobel,et al.  Concordance among Gene-Expression – Based Predictors for Breast Cancer , 2011 .

[27]  Keiichiro Nishida,et al.  Expression of microRNA-146 in rheumatoid arthritis synovial tissue. , 2008, Arthritis and rheumatism.