Identifying predictive features in drug response using machine learning: opportunities and challenges.

This article reviews several techniques from machine learning that can be used to study the problem of identifying a small number of features, from among tens of thousands of measured features, that can accurately predict a drug response. Prediction problems are divided into two categories: sparse classification and sparse regression. In classification, the clinical parameter to be predicted is binary, whereas in regression, the parameter is a real number. Well-known methods for both classes of problems are briefly discussed. These include the SVM (support vector machine) for classification and various algorithms such as ridge regression, LASSO (least absolute shrinkage and selection operator), and EN (elastic net) for regression. In addition, several well-established methods that do not directly fall into machine learning theory are also reviewed, including neural networks, PAM (pattern analysis for microarrays), SAM (significance analysis for microarrays), GSEA (gene set enrichment analysis), and k-means clustering. Several references indicative of the application of these methods to cancer biology are discussed.

[1]  R. Tibshirani,et al.  On testing the significance of sets of genes , 2006, math/0610667.

[2]  Michael A. White,et al.  Inferring weighted and directed gene interaction networks from gene expression data using the phi-mixing coefficient , 2012, Proceedings 2012 IEEE International Workshop on Genomic Signal Processing and Statistics (GENSIPS).

[3]  A. Jemal,et al.  Cancer statistics, 2014 , 2014, CA: a cancer journal for clinicians.

[4]  Chris Wiggins,et al.  ARACNE: An Algorithm for the Reconstruction of Gene Regulatory Networks in a Mammalian Cellular Context , 2004, BMC Bioinformatics.

[5]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[6]  Steven J. M. Jones,et al.  Comprehensive genomic characterization of squamous cell lung cancers , 2012, Nature.

[7]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[8]  Noah Simon,et al.  A Sparse-Group Lasso , 2013 .

[9]  R. Tibshirani,et al.  Diagnosis of multiple cancer types by shrunken centroids of gene expression , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[10]  A. Tikhonov On the stability of inverse problems , 1943 .

[11]  Yudong D. He,et al.  Gene expression profiling predicts clinical outcome of breast cancer , 2002, Nature.

[12]  Richard M. Dudley,et al.  Some special vapnik-chervonenkis classes , 1981, Discret. Math..

[13]  Mathukumalli Vidyasagar Computational Cancer Biology: An Interaction Network Approach , 2012 .

[14]  A. E. Hoerl,et al.  Ridge regression: biased estimation for nonorthogonal problems , 2000 .

[15]  Julio Saez-Rodriguez,et al.  Machine Learning Prediction of Cancer Cell Sensitivity to Drugs Based on Genomic and Chemical Properties , 2012, PloS one.

[16]  E. Berns,et al.  Additional value of the 70-gene signature and levels of ER and PR for the prediction of outcome in tamoxifen-treated ER-positive breast cancer. , 2012, Breast.

[17]  I S Kohane,et al.  Mutual information relevance networks: functional genomic clustering using pairwise entropy measurements. , 1999, Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing.

[18]  Steven J. M. Jones,et al.  Comprehensive molecular portraits of human breast tumors , 2012, Nature.

[19]  Mathukumalli Vidyasagar,et al.  Mixing Coefficients Between Discrete and Real Random Variables: Computation and Properties , 2012, IEEE Transactions on Automatic Control.

[20]  Paul S. Bradley,et al.  Feature Selection via Concave Minimization and Support Vector Machines , 1998, ICML.

[21]  Manish Kakar,et al.  Early prediction of response to radiotherapy and androgen-deprivation therapy in prostate cancer by repeated functional MRI: a preclinical study , 2011, Radiation oncology.

[22]  Michael A. White,et al.  A new feature selection algorithm for two-class classification problems and application to endometrial cancer , 2012, 2012 IEEE 51st IEEE Conference on Decision and Control (CDC).

[23]  A. D’Andrea,et al.  A DNA Repair Pathway–Focused Score for Prediction of Outcomes in Ovarian Cancer Treated With Platinum-Based Chemotherapy , 2012, Journal of the National Cancer Institute.

[24]  M. R. Osborne,et al.  On the LASSO and its Dual , 2000 .

[25]  Robert Tibshirani,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition , 2001, Springer Series in Statistics.

[26]  Jason Weston,et al.  Gene Selection for Cancer Classification using Support Vector Machines , 2002, Machine Learning.

[27]  Pablo Tamayo,et al.  Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[28]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[29]  M. J. van de Vijver,et al.  The 70-gene prognosis signature predicts early metastasis in breast cancer patients between 55 and 70 years of age. , 2010, Annals of oncology : official journal of the European Society for Medical Oncology.

[30]  Joshua M. Korn,et al.  Comprehensive genomic characterization defines human glioblastoma genes and core pathways , 2008, Nature.

[31]  Steven J. M. Jones,et al.  Comprehensive molecular characterization of human colon and rectal cancer , 2012, Nature.

[32]  R. Tibshirani,et al.  Significance analysis of microarrays applied to the ionizing radiation response , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[33]  M. Yuan,et al.  Model selection and estimation in regression with grouped variables , 2006 .

[34]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[35]  Benjamin J. Raphael,et al.  Integrated Genomic Analyses of Ovarian Carcinoma , 2011, Nature.

[36]  Mathukumalli Vidyasagar,et al.  Machine learning methods in the computational biology of cancer , 2014, Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences.