CODE-AE: A Coherent De-confounding Autoencoder for Predicting Patient-Specific Drug Response From Cell Line Transcriptomics

Accurate and robust prediction of patient's response to drug treatments is critical for developing precision medicine. However, it is often difficult to obtain a sufficient amount of coherent drug response data from patients directly for training a generalized machine learning model. Although the utilization of rich cell line data provides an alternative solution, it is challenging to transfer the knowledge obtained from cell lines to patients due to various confounding factors. Few existing transfer learning methods can reliably disentangle common intrinsic biological signals from confounding factors in the cell line and patient data. In this paper, we develop a Coherent Deconfounding Autoencoder (CODE-AE) that can extract both common biological signals shared by incoherent samples and private representations unique to each data set, transfer knowledge learned from cell line data to tissue data, and separate confounding factors from them. Extensive studies on multiple data sets demonstrate that CODE-AE significantly improves the accuracy and robustness over state-of-the-art methods in both predicting patient drug response and de-confounding biological signals. Thus, CODE-AE provides a useful framework to take advantage of in vitro omics data for developing generalized patient predictive models. The source code is available at this https URL.

[1]  Aristotelis Tsirigos,et al.  A Deep Learning Framework for Predicting Response to Therapy in Cancer. , 2019, Cell reports.

[2]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[3]  George Trigeorgis,et al.  Domain Separation Networks , 2016, NIPS.

[4]  C. Hutter,et al.  The Cancer Genome Atlas: Creating Lasting Value beyond Its Data , 2018, Cell.

[5]  Anne E Carpenter,et al.  Opportunities and obstacles for deep learning in biology and medicine , 2017, bioRxiv.

[6]  Qiang Yang,et al.  A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[7]  Aaron C. Courville,et al.  Improved Training of Wasserstein GANs , 2017, NIPS.

[8]  Kilian Q. Weinberger,et al.  Marginalized Denoising Autoencoders for Domain Adaptation , 2012, ICML.

[9]  Adam A. Margolin,et al.  The Cancer Cell Line Encyclopedia enables predictive modeling of anticancer drug sensitivity , 2012, Nature.

[10]  Yoshua Bengio,et al.  Extracting and composing robust features with denoising autoencoders , 2008, ICML '08.

[11]  Trevor Darrell,et al.  Adversarial Discriminative Domain Adaptation , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Mengjie Zhang,et al.  Deep Reconstruction-Classification Networks for Unsupervised Domain Adaptation , 2016, ECCV.

[13]  Léon Bottou,et al.  Wasserstein Generative Adversarial Networks , 2017, ICML.

[14]  Joshua M. Korn,et al.  Next-generation characterization of the Cancer Cell Line Encyclopedia , 2019, Nature.

[15]  Trevor Darrell,et al.  Factorized Orthogonal Latent Spaces , 2010, AISTATS.

[16]  Bernhard Schölkopf,et al.  A Kernel Two-Sample Test , 2012, J. Mach. Learn. Res..

[17]  Lei Xie,et al.  A Cross-Level Information Transmission Network for Predicting Phenotype from New Genotype: Application to Cancer Precision Medicine , 2020, ArXiv.

[18]  Kate Saenko,et al.  Deep CORAL: Correlation Alignment for Deep Domain Adaptation , 2016, ECCV Workshops.

[19]  François Laviolette,et al.  Domain-Adversarial Training of Neural Networks , 2015, J. Mach. Learn. Res..

[20]  Sridhar Ramaswamy,et al.  Genomics of Drug Sensitivity in Cancer (GDSC): a resource for therapeutic biomarker discovery in cancer cells , 2012, Nucleic Acids Res..

[21]  Y. Hoshida,et al.  Cancer biomarker discovery and validation. , 2015, Translational cancer research.

[22]  Geoffrey E. Hinton,et al.  Autoencoders, Minimum Description Length and Helmholtz Free Energy , 1993, NIPS.

[23]  Tom Fawcett,et al.  An introduction to ROC analysis , 2006, Pattern Recognit. Lett..

[24]  Peng Qiu,et al.  Leveraging TCGA gene expression data to build predictive models for cancer drug response , 2020, BMC Bioinformatics.

[25]  Emanuel J. V. Gonçalves,et al.  A Landscape of Pharmacogenomic Interactions in Cancer , 2016, Cell.

[26]  Joseph D. Janizek,et al.  Adversarial Deconfounding Autoencoder for Learning Robust Gene Expression Embeddings , 2020, bioRxiv.

[27]  Reza Ghaeini,et al.  A Deep Learning Approach for Cancer Detection and Relevant Gene Identification , 2017, PSB.

[28]  Mary Goldman,et al.  The UCSC Xena platform for public and private cancer genomics data visualization and interpretation , 2018, bioRxiv.

[29]  D. Barash,et al.  Resistance to paclitaxel is associated with a variant of the gene BCL2 in multiple tumor types , 2019, npj Precision Oncology.

[30]  Nicola J. Rinaldi,et al.  Genetic effects on gene expression across human tissues , 2017, Nature.

[31]  Mengjie Zhang,et al.  Domain Generalization for Object Recognition with Multi-task Autoencoders , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[32]  Trevor Darrell,et al.  Deep Domain Confusion: Maximizing for Domain Invariance , 2014, CVPR 2014.