Leveraging TCGA gene expression data to build predictive models for cancer drug response

Background Machine learning has been utilized to predict cancer drug response from multi-omics data generated from sensitivities of cancer cell lines to different therapeutic compounds. Here, we build machine learning models using gene expression data from patients’ primary tumor tissues to predict whether a patient will respond positively or negatively to two chemotherapeutics: 5-Fluorouracil and Gemcitabine. Results We focused on 5-Fluorouracil and Gemcitabine because based on our exclusion criteria, they provide the largest numbers of patients within TCGA. Normalized gene expression data were clustered and used as the input features for the study. We used matching clinical trial data to ascertain the response of these patients via multiple classification methods. Multiple clustering and classification methods were compared for prediction accuracy of drug response. Clara and random forest were found to be the best clustering and classification methods, respectively. The results show our models predict with up to 86% accuracy; despite the study’s limitation of sample size. We also found the genes most informative for predicting drug response were enriched in well-known cancer signaling pathways and highlighted their potential significance in chemotherapy prognosis. Conclusions Primary tumor gene expression is a good predictor of cancer drug response. Investment in larger datasets containing both patient gene expression and drug response is needed to support future work of machine learning models. Ultimately, such predictive models may aid oncologists with making critical treatment decisions.

[1]  M. Chiurillo Role of the Wnt/β-catenin pathway in gastric cancer: An in-depth literature review. , 2015, World journal of experimental medicine.

[2]  D. Pennington,et al.  Cytokines and chemokines: At the crossroads of cell signalling and inflammatory disease. , 2014, Biochimica et biophysica acta.

[3]  Nci Dream Community A community effort to assess and improve drug sensitivity prediction algorithms , 2014 .

[4]  Walter J. Scheirer,et al.  Using human brain activity to guide machine learning , 2017, Scientific Reports.

[5]  Anushya Muruganujan,et al.  Large-scale gene function analysis with the PANTHER classification system , 2013, Nature Protocols.

[6]  Peter W. Laird,et al.  Cell-of-Origin Patterns Dominate the Molecular Classification of 10,000 Tumors from 33 Types of Cancer , 2018, Cell.

[7]  Dong Wei,et al.  Comprehensive anticancer drug response prediction based on a simple cell line-drug complex network model , 2019, BMC Bioinformatics.

[8]  N. Cox,et al.  Clinical drug response can be predicted using baseline gene expression levels and in vitro drug sensitivity in cell lines , 2014, Genome Biology.

[9]  Tae Soon Kim,et al.  Cancer Drug Response Profile scan (CDRscan): A Deep Learning Model That Predicts Drug Effectiveness from Cancer Genomic Signature , 2018, Scientific Reports.

[10]  S. P. Lloyd,et al.  Least squares quantization in PCM , 1982, IEEE Trans. Inf. Theory.

[11]  S. C. Johnson Hierarchical clustering schemes , 1967, Psychometrika.

[12]  MengChu Zhou,et al.  Weighted Gini index feature selection method for imbalanced data , 2018, 2018 IEEE 15th International Conference on Networking, Sensing and Control (ICNSC).

[13]  Francisco Azuaje,et al.  Computational models for predicting drug responses in cancer research , 2016, Briefings Bioinform..

[14]  Michael Sekula,et al.  OptCluster : an R package for determining the optimal clustering algorithm and optimal number of clusters. , 2015 .

[15]  S. Tu,et al.  Personalised cancer care: promises and challenges of targeted therapy , 2016, Journal of the Royal Society of Medicine.

[16]  Doheon Lee,et al.  Inferring Pathway Activity toward Precise Disease Classification , 2008, PLoS Comput. Biol..

[17]  Xing Chen,et al.  Anti-cancer Drug Response Prediction Using Neighbor-Based Collaborative Filtering with Global Effect Removal , 2018, Molecular therapy. Nucleic acids.

[18]  Cha Zhang,et al.  Ensemble Machine Learning: Methods and Applications , 2012 .

[19]  D. Cheresh,et al.  Integrins and cancer: regulators of cancer stemness, metastasis, and drug resistance. , 2015, Trends in cell biology.

[20]  J. Stec,et al.  Gene expression profiles predict complete pathologic response to neoadjuvant paclitaxel and fluorouracil, doxorubicin, and cyclophosphamide chemotherapy in breast cancer. , 2004, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[21]  Denis Bertrand,et al.  Predicting Cancer Drug Response Using a Recommender System , 2017, bioRxiv.

[22]  A. Raftery,et al.  Model-based Gaussian and non-Gaussian clustering , 1993 .

[23]  Stefanie Seiler,et al.  Finding Groups In Data , 2016 .

[24]  Y. Komiya,et al.  Wnt signal transduction pathways , 2008, Organogenesis.

[25]  John F. McDonald,et al.  Open source machine-learning algorithms for the prediction of optimal cancer drug therapies , 2017, PloS one.

[26]  Emanuel Schwarz,et al.  BioMM: Biologically-informed Multi-stage Machine learning for identification of epigenetic fingerprints , 2017, 1712.00336.

[27]  Andy Liaw,et al.  Classification and Regression by randomForest , 2007 .

[28]  Yufei Huang,et al.  Predicting drug response of tumors from integrated genomic profiles by deep neural networks , 2018, BMC Medical Genomics.

[29]  Joshua M. Stuart,et al.  The Cancer Genome Atlas Pan-Cancer analysis project , 2013, Nature Genetics.

[30]  L. Been,et al.  Variants in KCNQ1 increase type II diabetes susceptibility in South Asians: A study of 3,310 subjects from India and the US , 2011, BMC Medical Genetics.

[31]  J. Ross,et al.  The HER‐2/neu Oncogene in Breast Cancer: Prognostic Factor, Predictive Factor, and Target for Therapy , 1998, The oncologist.

[32]  Adam A. Margolin,et al.  The Cancer Cell Line Encyclopedia enables predictive modeling of anticancer drug sensitivity , 2012, Nature.

[33]  Hellyeh Hamidi,et al.  Integrin trafficking in cells and tissues , 2019, Nature Cell Biology.

[34]  J. Mesirov,et al.  Interpreting patterns of gene expression with self-organizing maps: methods and application to hematopoietic differentiation. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[35]  HA Hejase,et al.  Improving Drug Sensitivity Prediction Using Different Types of Data , 2015, CPT: pharmacometrics & systems pharmacology.

[36]  R. O’Reilly Six principles for biologically based computational models of cortical cognition , 1998, Trends in Cognitive Sciences.

[37]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[38]  Ian Collins,et al.  New approaches to molecular cancer therapeutics , 2006, Nature chemical biology.

[39]  Vinay Prasad,et al.  Precision oncology: origins, optimism, and potential. , 2016, The Lancet. Oncology.

[40]  Peter J. Rousseeuw,et al.  Finding Groups in Data: An Introduction to Cluster Analysis , 1990 .

[41]  Xing Chen,et al.  Anticancer Drug Response Prediction in Cell Lines Using Weighted Graph Regularized Matrix Factorization , 2019, Molecular therapy. Nucleic acids.

[42]  Laura M. Heiser,et al.  A community effort to assess and improve drug sensitivity prediction algorithms , 2014, Nature Biotechnology.

[43]  Ao Li,et al.  A novel approach for drug response prediction in cancer cell lines via network representation learning , 2018, Bioinform..

[44]  M. Hutson Artificial intelligence faces reproducibility crisis. , 2018, Science.

[45]  Mathukumalli Vidyasagar,et al.  Identifying predictive features in drug response using machine learning: opportunities and challenges. , 2015, Annual review of pharmacology and toxicology.