Predicting chemotherapy response using a variational autoencoder approach

Background Multiple studies have shown the utility of transcriptome-wide RNA-seq profiles as features for machine learning-based prediction of response to chemotherapy in cancer. While tumor transcriptome profiles are publicly available for thousands of tumors for many cancer types, a relatively modest number of tumor profiles are clinically annotated for response to chemotherapy. The paucity of labeled examples and the high dimension of the feature data limit performance for predicting therapeutic response using fully-supervised classification methods. Recently, multiple studies have established the utility of a deep neural network approach, the variational autoencoder (VAE), for generating meaningful latent features from original data. Here, we report the first study of a semi-supervised approach using VAE-encoded tumor transcriptome features and regularized gradient boosted decision trees (XGBoost) to predict chemotherapy drug response for five cancer types: colon, pancreatic, bladder, breast, and sarcoma. Results We found: (1) VAE-encoding of the tumor transcriptome preserves the cancer type identity of the tumor, suggesting preservation of biologically relevant information; and (2) as a feature-set for supervised classification to predict response-to-chemotherapy, the unsupervised VAE encoding of the tumor’s gene expression profile leads to better area under the receiver operating characteristic curve and area under the precision-recall curve classification performance than the original gene expression profile or the PCA principal components or the ICA components of the gene expression profile, in four out of five cancer types that we tested. Conclusions Given high-dimensional “omics” data, the VAE is a powerful tool for obtaining a nonlinear low-dimensional embedding; it yields features that retain biological patterns that distinguish between different types of cancer and that enable more accurate tumor transcriptome-based prediction of response to chemotherapy than would be possible using the original data or their principal components.

[1]  Mary Goldman,et al.  The UCSC Xena platform for public and private cancer genomics data visualization and interpretation , 2018, bioRxiv.

[2]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[3]  A. Shuman,et al.  Revisiting Expectations in an Era of Precision Oncology. , 2018, The oncologist.

[4]  Casey S Greene,et al.  Parameter tuning is a key part of dimensionality reduction via deep variational autoencoders for single cell RNA transcriptomics , 2018, bioRxiv.

[5]  S. Kaestner,et al.  Chemotherapy dosing part I: scientific basis for current practice and use of body surface area. , 2007, Clinical oncology (Royal College of Radiologists (Great Britain)).

[6]  Casey S. Greene,et al.  Extracting a Biologically Relevant Latent Space from Cancer Transcriptomes with Variational Autoencoders , 2017, bioRxiv.

[7]  N. Cox,et al.  Clinical drug response can be predicted using baseline gene expression levels and in vitro drug sensitivity in cell lines , 2014, Genome Biology.

[8]  Yufei Huang,et al.  Predicting drug response of tumors from integrated genomic profiles by deep neural networks , 2018, BMC Medical Genomics.

[9]  Pietro Liò,et al.  Unsupervised Machine Learning for Data Encoding applied to Ovarian Cancer Transcriptomes , 2019, bioRxiv.

[10]  Nicolas Servant,et al.  A comprehensive evaluation of normalization methods for Illumina high-throughput RNA sequencing data analysis , 2013, Briefings Bioinform..

[11]  H. Gurney,et al.  How to calculate the dose of chemotherapy , 2002, British Journal of Cancer.

[12]  Won-Ki Jeong,et al.  CPEM: Accurate cancer type classification based on somatic alterations using an ensemble of a random forest and a deep neural network , 2019, Scientific Reports.

[13]  Mark D. M. Leiserson,et al.  Precision Oncology: The Road Ahead. , 2017, Trends in molecular medicine.

[14]  Jionglong Su,et al.  A Novel XGBoost Method to Identify Cancer Tissue-of-Origin Based on Copy Number Variations , 2020, Frontiers in Genetics.

[15]  E. Espinosa,et al.  Prediction of adjuvant chemotherapy response in triple negative breast cancer with discovery and targeted proteomics , 2017, PloS one.

[16]  Su-In Lee,et al.  DeepProfile: Deep learning of cancer molecular profiles for precision medicine , 2018, bioRxiv.

[17]  James She,et al.  Collaborative Variational Autoencoder for Recommender Systems , 2017, KDD.

[18]  Johanna Hardin,et al.  Selecting between‐sample RNA‐Seq normalization methods from the perspective of their assumptions , 2016, Briefings Bioinform..

[19]  Alioune Ngom,et al.  Predicting Outcomes of Hormone and Chemotherapy in the Molecular Taxonomy of Breast Cancer International Consortium (METABRIC) Study by Biochemically-inspired Machine Learning , 2017, F1000Research.

[20]  Ryad Zemouri,et al.  Deep Convolutional Variational Autoencoder as a 2D-Visualization Tool for Partial Discharge Source Classification in Hydrogenerators , 2020, IEEE Access.

[21]  Yuanqing Li,et al.  ICA Based Semi-supervised Learning Algorithm for BCI Systems , 2006, ICA.

[22]  Sebastian Thrun,et al.  Dermatologist-level classification of skin cancer with deep neural networks , 2017, Nature.

[23]  Pippa Corrie Cytotoxic chemotherapy: clinical aspects , 2004 .

[24]  Marie-Cécile Le Deley,et al.  High-Throughput Genomics and Clinical Outcome in Hard-to-Treat Advanced Cancers: Results of the MOSCATO 01 Trial. , 2017, Cancer discovery.

[25]  Benjamin E. Gross,et al.  The cBio cancer genomics portal: an open platform for exploring multidimensional cancer genomics data. , 2012, Cancer discovery.

[26]  Amiram Gafni,et al.  Helping patients make informed choices: a randomized trial of a decision aid for adjuvant chemotherapy in lymph node-negative breast cancer. , 2003, Journal of the National Cancer Institute.

[27]  S. Ishii,et al.  Molecular Prediction of Response to 5-Fluorouracil and Interferon-α Combination Chemotherapy in Advanced Hepatocellular Carcinoma , 2004, Clinical Cancer Research.

[28]  Tianqi Chen,et al.  XGBoost: A Scalable Tree Boosting System , 2016, KDD.

[29]  M. Kramer Nonlinear principal component analysis using autoassociative neural networks , 1991 .

[30]  E. Prochownik,et al.  Diagnostic and prognostic implications of ribosomal protein transcript expression patterns in human cancers , 2018, BMC Cancer.

[31]  J. Feliu,et al.  A Combined Strategy of SAGE and Quantitative PCR Provides a 13-Gene Signature that Predicts Preoperative Chemoradiotherapy Response and Outcome in Rectal Cancer , 2011, Clinical Cancer Research.

[32]  Benjamin Haibe-Kains,et al.  Dr.VAE: improving drug response prediction via modeling of drug perturbation effects , 2019, Bioinform..

[33]  H. Dombret,et al.  Risk factors and decision criteria for intensive chemotherapy in older patients with acute myeloid leukemia , 2008, Haematologica.

[34]  Aristotelis Tsirigos,et al.  A Deep Learning Framework for Predicting Response to Therapy in Cancer. , 2019, Cell reports.

[35]  Benjamin E. Gross,et al.  Integrative Analysis of Complex Cancer Genomics and Clinical Profiles Using the cBioPortal , 2013, Science Signaling.

[36]  John Duchi,et al.  Derivations for Linear Algebra and Optimization , 2016 .

[37]  Carly A. Bobak,et al.  Unsupervised deep learning with variational autoencoders applied to breast tumor genome-wide DNA methylation data with biologic feature extraction , 2018, bioRxiv.

[38]  Matthew Meyerson,et al.  Somatic alterations in the human cancer genome. , 2004, Cancer cell.

[39]  C. Hutter,et al.  The Cancer Genome Atlas: Creating Lasting Value beyond Its Data , 2018, Cell.

[40]  Franck Molina,et al.  Gene expression signature in advanced colorectal cancer patients select drugs and response for the use of leucovorin, fluorouracil, and irinotecan. , 2007, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[41]  Kwong-Sak Leung,et al.  Improving prediction of phenotypic drug response on cancer cell lines using deep convolutional network , 2018, BMC Bioinformatics.