Machine learning for biomarker identification in cancer research - developments toward its clinical application.

The patterns identified from the systematically collected molecular profiles of patient tumor samples, along with clinical metadata, can assist personalized treatments for effective management of cancer patients with similar molecular subtypes. There is an unmet need to develop computational algorithms for cancer diagnosis, prognosis and therapeutics that can identify complex patterns and help in classifications based on plethora of emerging cancer research outcomes in public domain. Machine learning, a branch of artificial intelligence, holds a great potential for pattern recognition in cryptic cancer datasets, as evident from recent literature survey. In this review, we focus on the current status of machine learning applications in cancer research, highlighting trends and analyzing major achievements, roadblocks and challenges toward its implementation in clinics.

[1]  Yixin Chen,et al.  Learning accurate and interpretable models based on regularized random forests regression , 2014, BMC Systems Biology.

[2]  Philip Gerlee,et al.  Bridging scales in cancer progression: mapping genotype to phenotype using neural networks. , 2014, Seminars in cancer biology.

[3]  Nancy Lan Guo,et al.  Signaling pathway-based identification of extensive prognostic gene signatures for lung adenocarcinoma. , 2012, Lung cancer.

[4]  X. Chen,et al.  Random forests for genomic data analysis. , 2012, Genomics.

[5]  Richard Simon,et al.  Implementing personalized cancer genomics in clinical trials , 2013, Nature Reviews Drug Discovery.

[6]  Xiaoju Wang,et al.  Development of a Multiplex Autoantibody Test for Detection of Lung Cancer , 2014, PloS one.

[7]  J. Carles,et al.  Identification of Tissue microRNAs Predictive of Sunitinib Activity in Patients with Metastatic Renal Cell Carcinoma , 2014, PloS one.

[8]  I. Rubinfeld,et al.  Personalized risk stratification for adverse surgical outcomes: innovation at the boundaries of medicine and computation. , 2010, Personalized medicine.

[9]  Zhi Yan,et al.  A two-microRNA signature as a potential biomarker for early gastric cancer , 2014, Oncology letters.

[10]  J. Seoane,et al.  The challenge of intratumour heterogeneity in precision medicine , 2014, Journal of internal medicine.

[11]  Robert F. Murphy,et al.  Automated comparison of protein subcellular location patterns between images of normal and cancerous tissues , 2008, 2008 5th IEEE International Symposium on Biomedical Imaging: From Nano to Macro.

[12]  Paulo J. G. Lisboa,et al.  The Use of Artificial Neural Networks in Decision Support in Cancer: a Systematic Review , 2005 .

[13]  Constantin F. Aliferis,et al.  A comprehensive comparison of random forests and support vector machines for microarray-based cancer classification , 2008, BMC Bioinformatics.

[14]  Benjamin E. Gross,et al.  The cBio cancer genomics portal: an open platform for exploring multidimensional cancer genomics data. , 2012, Cancer discovery.

[15]  Hiroyuki Aburatani,et al.  Genomic approach towards personalized anticancer drug therapy. , 2012, Pharmacogenomics.

[16]  V. Kuznetsov,et al.  A robust tool for discriminative analysis and feature selection in paired samples impacts the identification of the genes essential for reprogramming lung tissue to adenocarcinoma , 2011, BMC Genomics.

[17]  Luonan Chen,et al.  ellipsoidFN: a tool for identifying a heterogeneous set of cancer biomarkers based on gene expressions , 2012, Nucleic acids research.

[18]  W. Baxt Application of artificial neural networks to clinical medicine , 1995, The Lancet.

[19]  Age K Smilde,et al.  A Critical Assessment of Feature Selection Methods for Biomarker Discovery in Clinical Proteomics* , 2012, Molecular & Cellular Proteomics.

[20]  Christophe Lemetre,et al.  An introduction to artificial neural networks in bioinformatics - application to complex microarray and mass spectrometry datasets in cancer studies , 2008, Briefings Bioinform..

[21]  Xiaotao Qu,et al.  SNP-SNP Interaction Network in Angiogenesis Genes Associated with Prostate Cancer Aggressiveness , 2013, PloS one.

[22]  I A Basheer,et al.  Artificial neural networks: fundamentals, computing, design, and application. , 2000, Journal of microbiological methods.

[23]  Shinto Eguchi,et al.  Common Peak Approach Using Mass Spectrometry Data Sets for Predicting the Effects of Anticancer Drugs on Breast Cancer , 2007, Cancer informatics.

[24]  Richard W Tothill,et al.  Next-generation sequencing for cancer diagnostics: a practical perspective. , 2011, The Clinical biochemist. Reviews.

[25]  Manolis Tsiknakis,et al.  Maturation of a central , 1996 .

[26]  Chris Mattmann,et al.  Computing: A vision for data science , 2013, Nature.

[27]  Wen-Qi Jiang,et al.  Serum diagnosis of diffuse large B-cell lymphomas and further identification of response to therapy using SELDI-TOF-MS and tree analysis patterning , 2007, BMC Cancer.

[28]  M. Tainsky,et al.  Genomic and proteomic biomarkers for cancer: a multitude of opportunities. , 2009, Biochimica et biophysica acta.

[29]  Yong Qian,et al.  Hybrid Models Identified a 12-Gene Signature for Lung Cancer Prognosis and Chemoresponse Prediction , 2010, PloS one.

[30]  Tony Pan,et al.  ImageMiner: a software system for comparative analysis of tissue microarrays using content-based image retrieval, high-performance computing, and grid technology , 2011, J. Am. Medical Informatics Assoc..

[31]  Björn Olsson,et al.  Classification of Tumor Samples from Expression Data Using Decision Trunks , 2013, Cancer informatics.

[32]  A. Jemal,et al.  Cancer statistics, 2014 , 2014, CA: a cancer journal for clinicians.

[33]  Shu-Lin Wang,et al.  Finding minimum gene subsets with heuristic breadth-first search algorithm for robust tumor classification , 2012, BMC Bioinformatics.

[34]  T. Sellers,et al.  Measurement of Phospholipids May Improve Diagnostic Accuracy in Ovarian Cancer , 2012, PloS one.

[35]  M. Ringnér,et al.  Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks , 2001, Nature Medicine.

[36]  Jos Boekhorst,et al.  Data mining in the Life Sciences with Random Forest: a walk in the park or lost in the jungle? , 2012, Briefings Bioinform..

[37]  T. Ideker,et al.  Network-based classification of breast cancer metastasis , 2007, Molecular systems biology.

[38]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[39]  William Stafford Noble,et al.  Support vector machine , 2013 .

[40]  Christophe Lemetre,et al.  DACH1: Its Role as a Classifier of Long Term Good Prognosis in Luminal Breast Cancer , 2014, PloS one.

[41]  H Aburatani,et al.  Potential responders to FOLFOX therapy for colorectal cancer by Random Forests analysis , 2011, British Journal of Cancer.

[42]  John Mendelsohn,et al.  WIN Consortium—challenges and advances , 2011, Nature Reviews Clinical Oncology.

[43]  N. Marcussen,et al.  Incidental renal neoplasms: is there a need for routine screening? A Danish single‐center epidemiological study , 2014, APMIS : acta pathologica, microbiologica, et immunologica Scandinavica.

[44]  Jihoon Kim,et al.  Grid Binary LOgistic REgression (GLORE): building shared models without sharing data , 2012, J. Am. Medical Informatics Assoc..

[45]  Gary D Bader,et al.  International network of cancer genome projects , 2010, Nature.

[46]  Levi A Garraway,et al.  Precision oncology: an overview. , 2013, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[47]  Carl Kingsford,et al.  What are decision trees? , 2008, Nature Biotechnology.

[48]  D. Hanahan,et al.  Hallmarks of Cancer: The Next Generation , 2011, Cell.

[49]  B. Karger,et al.  Discovery of Lung Cancer Biomarkers by Profiling the Plasma Proteome with Monoclonal Antibody Libraries , 2011, Molecular & Cellular Proteomics.

[50]  D. Harats,et al.  Serum Apolipoproteins C-I and C-III Are Reduced in Stomach Cancer Patients: Results from MALDI-Based Peptidome and Immuno-Based Clinical Assays , 2011, PloS one.

[51]  P. Febbo,et al.  Defining aggressive prostate cancer using a 12-gene model. , 2006, Neoplasia.

[52]  Marek Kretowski,et al.  Evolutionary Approach for Relative Gene Expression Algorithms , 2014, TheScientificWorldJournal.

[53]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[54]  Fabrício F. Costa Big data in biomedicine. , 2014, Drug discovery today.

[55]  Xiaojin Zhu,et al.  Introduction to Semi-Supervised Learning , 2009, Synthesis Lectures on Artificial Intelligence and Machine Learning.

[56]  Pedro Larrañaga,et al.  A review of feature selection techniques in bioinformatics , 2007, Bioinform..

[57]  Adam A. Margolin,et al.  Systematic Analysis of Challenge-Driven Improvements in Molecular Prognostic Models for Breast Cancer , 2013, Science Translational Medicine.

[58]  Michael L. Gatza,et al.  A pathway-based classification of human breast cancer , 2010, Proceedings of the National Academy of Sciences.

[59]  Jaeyun Sung,et al.  Relative Expression Analysis for Molecular Cancer Diagnosis and Prognosis , 2010, Technology in cancer research & treatment.

[60]  Nagiza F. Samatova,et al.  Spice: discovery of phenotype-determining component interplays , 2012, BMC Systems Biology.

[61]  Tai-Hsien Ou Yang,et al.  Development of a Prognostic Model for Breast Cancer Survival in an Open Challenge Environment , 2013, Science Translational Medicine.

[62]  Azadeh Mohammadi,et al.  Identification of disease-causing genes using microarray data mining and Gene Ontology , 2011, BMC Medical Genomics.

[63]  Daniel Q. Naiman,et al.  The ordering of expression among a few genes can provide simple cancer biomarkers and signal BRCA1 mutations , 2009, BMC Bioinformatics.

[64]  Jean-Christophe Nebel,et al.  Comparative study and meta-analysis of meta-analysis studies for the correlation of genomic markers with early cancer detection , 2013, Human Genomics.

[65]  P. Vineis,et al.  Global cancer patterns: causes and prevention , 2014, The Lancet.

[66]  Oleg Okun,et al.  Random Forest for Gene Expression Based Cancer Classification: Overlooked Issues , 2007, IbPRIA.

[67]  Sayan Mukherjee,et al.  Do serum biomarkers really measure breast cancer , 2009 .

[68]  Bilge Karaçali,et al.  Prediction potential of candidate biomarker sets identified and validated on gene expression data from multiple datasets , 2007, BMC Bioinformatics.

[69]  D. Saslow,et al.  Cancer screening in the United States, 2013 , 2013, CA: a cancer journal for clinicians.

[70]  Kurt Miller,et al.  Artificial neural networks and prostate cancer—tools for diagnosis and management , 2013, Nature Reviews Urology.

[71]  Paul R Young,et al.  Bladder Cancer–Associated Gene Expression Signatures Identified by Profiling of Exfoliated Urothelia , 2009, Cancer Epidemiology Biomarkers & Prevention.

[72]  Charles DeLisi,et al.  Pathway-based classification of cancer subtypes , 2012, Biology Direct.

[73]  Nathan D. Price,et al.  The top-scoring ‘N’ algorithm: a generalized relative expression classification method from small numbers of biomolecules , 2012, BMC Bioinformatics.

[74]  Thibault Helleputte,et al.  Robust biomarker identification for cancer diagnosis with ensemble feature selection methods , 2010, Bioinform..

[75]  Pedro Larrañaga,et al.  Identification of a biomarker panel for colorectal cancer diagnosis , 2012, BMC Cancer.

[76]  Eitan Rubin,et al.  A Novel “Reactomics” Approach for Cancer Diagnostics , 2012, Sensors.

[77]  Shyam Visweswaran,et al.  Bayesian rule learning for biomedical data mining , 2010, Bioinform..

[78]  Zeenia Jagga,et al.  Classification models for clear cell renal carcinoma stage progression, based on tumor RNAseq expression trained supervised machine learning algorithms , 2014, BMC Proceedings.

[79]  K. Marx,et al.  Applications of Machine Learning and High‐Dimensional Visualization in Cancer Detection, Diagnosis, and Management , 2004, Annals of the New York Academy of Sciences.

[80]  Henry Han,et al.  Nonnegative principal component analysis for mass spectral serum profiles and biomarker discovery , 2010, BMC Bioinformatics.

[81]  E Mjolsness,et al.  Machine learning for science: state of the art and future prospects. , 2001, Science.

[82]  Adam A. Margolin,et al.  Assessing the clinical utility of cancer genomic and proteomic data across tumor types , 2014, Nature Biotechnology.

[83]  Jannik N. Andersen,et al.  Cancer genomics: from discovery science to personalized medicine , 2011, Nature Medicine.

[84]  Michael Q. Zhang,et al.  Profiling alternatively spliced mRNA isoforms for prostate cancer classification , 2006, BMC Bioinformatics.

[85]  M. Stratton,et al.  The cancer genome , 2009, Nature.

[86]  Di Zhao,et al.  Combining PubMed knowledge and EHR data to develop a weighted bayesian network for pancreatic cancer prediction , 2011, J. Biomed. Informatics.

[87]  Charity L. Washam,et al.  Identification of PTHrP(12-48) as a Plasma Biomarker Associated with Breast Cancer Bone Metastasis , 2013, Cancer Epidemiology, Biomarkers & Prevention.

[88]  Benjamin E. Gross,et al.  Integrative Analysis of Complex Cancer Genomics and Clinical Profiles Using the cBioPortal , 2013, Science Signaling.

[89]  Fabien Campagne,et al.  Mining expressed sequence tags identifies cancer markers of clinical interest , 2006, BMC Bioinformatics.

[90]  Martin Dugas,et al.  Comparative study of unsupervised dimension reduction techniques for the visualization of microarray gene expression data , 2010, BMC Bioinformatics.

[91]  E. Mardis The impact of next-generation sequencing technology on genetics. , 2008, Trends in genetics : TIG.

[92]  Jeff Shrager,et al.  Rapid learning for precision oncology , 2014, Nature Reviews Clinical Oncology.

[93]  Paulo J. G. Lisboa,et al.  Machine learning in cancer research: implications for personalised medicine , 2008, ESANN.

[94]  Yong Wang,et al.  iPcc: a novel feature extraction method for accurate disease class discovery and prediction , 2013, Nucleic acids research.

[95]  Vanathi Gopalakrishnan,et al.  A Multiplexed Serum Biomarker Immunoassay Panel Discriminates Clinical Lung Cancer Patients from High-Risk Individuals Found to be Cancer-Free by CT Screening , 2012, Journal of thoracic oncology : official publication of the International Association for the Study of Lung Cancer.

[96]  David S. Wishart,et al.  Applications of Machine Learning in Cancer Prediction and Prognosis , 2006, Cancer informatics.

[97]  Li Liu,et al.  Improved breast cancer prognosis through the combination of clinical and genetic markers , 2007, Bioinform..

[98]  D. Saslow,et al.  Cancer screening in the United States, 2011 , 2011, CA: a cancer journal for clinicians.

[99]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .