DIVERSE: Bayesian Data IntegratiVE Learning for Precise Drug ResponSE Prediction

Detecting predictive biomarkers from multi-omics data is important for precision medicine, to improve diagnostics of complex diseases and for better treatments. This needs substantial experimental efforts that are made difficult by the heterogeneity of cell lines and huge cost. An effective solution is to build a computational model over the diverse omics data, including genomic, molecular, and environmental information. However, choosing informative and reliable data sources from among the different types of data is a challenging problem. We propose DIVERSE, a framework of Bayesian importance-weighted tri- and bi-matrix factorization(DIVERSE3 or DIVERSE2) to predict drug responses from data of cell lines, drugs, and gene interactions. DIVERSE integrates the data sources systematically, in a step-wise manner, examining the importance of each added data set in turn. More specifically, we sequentially integrate five different data sets, which have not all been combined in earlier bioinformatic methods for predicting drug responses. Empirical experiments show that DIVERSE clearly outperformed five other methods including three state-of-the-art approaches, under cross-validation, particularly in out-of-matrix prediction, which is closer to the setting of real use cases and more challenging than simpler in-matrix prediction. Additionally, case studies for discovering new drugs further confirmed the performance advantage of DIVERSE.

[1]  Adam A. Margolin,et al.  The Cancer Cell Line Encyclopedia enables predictive modeling of anticancer drug sensitivity , 2012, Nature.

[2]  Kenta Nakai,et al.  Biomarker discovery by integrated joint non-negative matrix factorization and pathway signature analyses , 2018, Scientific Reports.

[3]  Na-Na Guan,et al.  A Hybrid Interpolation Weighted Collaborative Filtering Method for Anti-cancer Drug Response Prediction , 2018, Front. Pharmacol..

[4]  Mustafa Coskun,et al.  Drug Response Prediction as a Link Prediction Problem , 2017, Scientific Reports.

[5]  George Papadatos,et al.  The ChEMBL database in 2017 , 2016, Nucleic Acids Res..

[6]  Sridhar Ramaswamy,et al.  Genomics of Drug Sensitivity in Cancer (GDSC): a resource for therapeutic biomarker discovery in cancer cells , 2012, Nucleic Acids Res..

[7]  Francisco Azuaje,et al.  Computational models for predicting drug responses in cancer research , 2016, Briefings Bioinform..

[8]  Kevin P. Murphy,et al.  Machine learning - a probabilistic perspective , 2012, Adaptive computation and machine learning series.

[9]  Xing Chen,et al.  NLLSS: Predicting Synergistic Drug Combinations Based on Semi-supervised Learning , 2016, PLoS Comput. Biol..

[10]  Tero Aittokallio,et al.  Drug response prediction by inferring pathway-response associations with kernelized Bayesian matrix factorization , 2016, Bioinform..

[11]  Xing Chen,et al.  Anti-cancer Drug Response Prediction Using Neighbor-Based Collaborative Filtering with Global Effect Removal , 2018, Molecular therapy. Nucleic acids.

[12]  Jun Wang,et al.  Predicting Anticancer Drug Responses Using a Dual-Layer Integrated Cell Line-Drug Network Model , 2015, PLoS Comput. Biol..

[13]  Hiroshi Mamitsuka,et al.  Improving drug response prediction by integrating multiple data sources: matrix factorization, kernel and network-based approaches , 2019, Briefings Bioinform..

[14]  Damian Szklarczyk,et al.  The STRING database in 2011: functional interaction networks of proteins, globally integrated and scored , 2010, Nucleic Acids Res..

[15]  Daoqiang Zhang,et al.  Two-Dimensional Non-negative Matrix Factorization for Face Representation and Recognition , 2005, AMFG.

[16]  Ao Li,et al.  A novel heterogeneous network-based method for drug response prediction in cancer cell lines , 2018, Scientific Reports.

[17]  Maricel G. Kann,et al.  IEEE/ACM Transactions on Computational Biology and Bioinformatics , 2013 .

[18]  Krister Wennerberg,et al.  Integrative and Personalized QSAR Analysis in Cancer by Kernelized Bayesian Matrix Factorization , 2014, J. Chem. Inf. Model..

[19]  Louxin Zhang,et al.  Improved anticancer drug response prediction in cell lines using matrix factorization with similarity regularization , 2017, BMC Cancer.

[20]  Laura M. Heiser,et al.  A community effort to assess and improve drug sensitivity prediction algorithms , 2014, Nature Biotechnology.

[21]  Juho Rousu,et al.  Learning with multiple pairwise kernels for drug bioactivity prediction , 2018, Bioinform..

[22]  Hongmin Cai,et al.  Identifying “Many-to-Many” Relationships between Gene-Expression Data and Drug-Response Data via Sparse Binary Matching , 2020, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[23]  Gang Fu,et al.  PubChem Substance and Compound databases , 2015, Nucleic Acids Res..

[24]  Guoqiang Han,et al.  HOGMMNC: a higher order graph matching with multiple network constraints model for gene‐drug regulatory modules identification , 2018, Bioinform..

[25]  D. Kerr,et al.  Predictive biomarkers: a paradigm shift towards personalized cancer medicine , 2011, Nature Reviews Clinical Oncology.

[26]  Jie Huang,et al.  Evaluation of gene-drug common module identification methods using pharmacogenomics data , 2020, Briefings Bioinform..

[27]  Pietro Liò,et al.  Bayesian Hybrid Matrix Factorisation for Data Integration , 2017, AISTATS.

[28]  Xing Chen,et al.  Anticancer Drug Response Prediction in Cell Lines Using Weighted Graph Regularized Matrix Factorization , 2019, Molecular therapy. Nucleic acids.

[29]  Vincent Y. F. Tan,et al.  Automatic Relevance Determination in Nonnegative Matrix Factorization with the /spl beta/-Divergence , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[30]  Julio Saez-Rodriguez,et al.  Machine Learning Prediction of Cancer Cell Sensitivity to Drugs Based on Genomic and Chemical Properties , 2012, PloS one.

[31]  C. Collins,et al.  MOLI: multi-omics late integration with deep neural networks for drug response prediction , 2019, bioRxiv.