Tree-based machine learning algorithms identified minimal set of miRNA biomarkers for breast cancer diagnosis and molecular subtyping.

Breast cancer is a complex disease and its effective treatment needs affordable diagnosis and subtyping signatures. While the use of machine learning approach in clinical computation biology is still in its infancy, the prevalent approach in identifying molecular biomarkers remains to be screening of all biomarkers by differential expression analysis. Many of these attempts used miRNAs expression data in breast cancer and amounted to the multitude of differentially expressed miRNAs in this cancer; hence, the minimal set of miRNA biomarkers to classify breast cancer is yet to be identified. Availability of diverse and vast amount of cancer datasets like The Cancer Genome Atlas facilitated the molecular profiling of patients' tumors and introduced new challenges like clinical grade interpretations from big data. In this study, miRNA expression dataset of breast cancer patients from TCGA database was used to develop prediction models from which miRNA biomarkers were identified for diagnosis and molecular subtyping of this cancer. I took the advantage of interpretability of tree-based classification models to extract their rules and identify minimal set of biomarkers in this cancer. Empirical negative control miRNAs in breast cancer obtained and used to normalize the dataset. Tree-based machine learning models trained in my analysis used hsa-miR-139 with hsa-miR-183 to classify breast tumors from normal samples, and hsa-miR4728 with hsa-miR190b to further classify these tumors into three major subtypes of breast cancer. In addition to the proposed biomarkers, the most important miRNAs in breast cancer classification were also presented.

[1]  Stephen H Bryant,et al.  An efficient algorithm coupled with synthetic minority over-sampling technique to classify imbalanced PubChem BioAssay data. , 2014, Analytica chimica acta.

[2]  S. Dudoit,et al.  Normalization of RNA-seq data using factor analysis of control genes or samples , 2014, Nature Biotechnology.

[3]  MiR-190b, the highest up-regulated miRNA in ERα-positive compared to ERα-negative breast tumors, a new biomarker in breast cancers? , 2015, BMC Cancer.

[4]  Jean YH Yang,et al.  Bioconductor: open software development for computational biology and bioinformatics , 2004, Genome Biology.

[5]  Minghui Li,et al.  Genome-wide analysis of microRNA and mRNA expression signatures in cancer , 2015, Acta Pharmacologica Sinica.

[6]  Blaz Zupan,et al.  Predictive data mining in clinical medicine: Current issues and guidelines , 2008, Int. J. Medical Informatics.

[7]  Graham J. Williams,et al.  Rattle: A Data Mining GUI for R , 2009, R J..

[8]  Ali Akbar Haghdoost,et al.  Candidate miRNAs in human breast cancer biomarkers: a systematic review , 2018, Breast Cancer.

[9]  Regina Berretta,et al.  The Discovery of Novel Biomarkers Improves Breast Cancer Intrinsic Subtype Prediction and Reconciles the Labels in the METABRIC Data Set , 2015, PloS one.

[10]  Zhonghu Bai,et al.  Breast cancer intrinsic subtype classification, clinical use and future trends. , 2015, American journal of cancer research.

[11]  Max Kuhn,et al.  The caret Package , 2007 .

[12]  Alex E. Lash,et al.  Gene Expression Omnibus: NCBI gene expression and hybridization array data repository , 2002, Nucleic Acids Res..

[13]  W. Huber,et al.  Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2 , 2014, Genome Biology.

[14]  Miki Ohira,et al.  Recent trends in microRNA research into breast cancer with particular focus on the associations between microRNAs and intrinsic subtypes , 2016, Journal of Human Genetics.

[15]  Yunfeng Fu,et al.  Hsa-miR-139-5p inhibits proliferation and causes apoptosis associated with down-regulation of c-Met , 2015, Oncotarget.

[16]  Barbara L. Smith,et al.  Breast cancer subtype approximated by estrogen receptor, progesterone receptor, and HER-2 is associated with local and distant recurrence after breast-conserving therapy. , 2008, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[17]  Dinesh Gupta,et al.  Machine learning for biomarker identification in cancer research - developments toward its clinical application. , 2015, Personalized medicine.

[18]  Luís Torgo,et al.  Data Mining with R: Learning with Case Studies , 2010 .

[19]  Gianluca Bontempi,et al.  TCGAbiolinks: an R/Bioconductor package for integrative analysis of TCGA data , 2015, Nucleic acids research.

[20]  Tal Galili,et al.  dendextend: an R package for visualizing, adjusting and comparing trees of hierarchical clustering , 2015, Bioinform..

[21]  David M. Simcha,et al.  Tackling the widespread and critical impact of batch effects in high-throughput data , 2010, Nature Reviews Genetics.

[22]  C. Croce,et al.  MicroRNA dysregulation in cancer: diagnostics, monitoring and therapeutics. A comprehensive review , 2012, EMBO molecular medicine.

[23]  P. Hu,et al.  MiR-183 Regulates ITGB1P Expression and Promotes Invasion of Endometrial Stromal Cells , 2015, BioMed research international.

[24]  R. Brem,et al.  miRNAs as potential biomarkers in early breast cancer detection following mammography , 2016, Cell & Bioscience.

[25]  Alfredo Hidalgo-Miranda,et al.  Identification and Pathway Analysis of microRNAs with No Previous Involvement in Breast Cancer , 2012, PloS one.

[26]  Andy Liaw,et al.  Classification and Regression by randomForest , 2007 .

[27]  Joshua M. Stuart,et al.  The Cancer Genome Atlas Pan-Cancer analysis project , 2013, Nature Genetics.

[28]  Sanghamitra Bandyopadhyay,et al.  MicroRNA signatures highlight new breast cancer subtypes. , 2015, Gene.

[29]  A. Nobel,et al.  Supervised risk predictor of breast cancer based on intrinsic subtypes. , 2009, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[30]  Giancarlo Mauri,et al.  How interacting pathways are regulated by miRNAs in breast cancer subtypes , 2016, BMC Bioinformatics.

[31]  Å. Borg,et al.  Identification of new microRNAs in paired normal and tumor breast tissue suggests a dual role for the ERBB2/Her2 gene. , 2011, Cancer research.