Computing Therapy for Precision Medicine: Collaborative Filtering Integrates and Predicts Multi-entity Interactions

Biomedicine produces copious information it cannot fully exploit. Specifically, there is considerable need to integrate knowledge from disparate studies to discover connections across domains. Here, we used a Collaborative Filtering approach, inspired by online recommendation algorithms, in which non-negative matrix factorization (NMF) predicts interactions among chemicals, genes, and diseases only from pairwise information about their interactions. Our approach, applied to matrices derived from the Comparative Toxicogenomics Database, successfully recovered Chemical-Disease, Chemical-Gene, and Disease-Gene networks in 10-fold cross-validation experiments. Additionally, we could predict each of these interaction matrices from the other two. Integrating all three CTD interaction matrices with NMF led to good predictions of STRING, an independent, external network of protein-protein interactions. Finally, this approach could integrate the CTD and STRING interaction data to improve Chemical-Gene cross-validation performance significantly, and, in a time-stamped study, it predicted information added to CTD after a given date, using only data prior to that date. We conclude that collaborative filtering can integrate information across multiple types of biological entities, and that as a first step towards precision medicine it can compute drug repurposing hypotheses.

[1]  Dennis M. Wilkinson,et al.  Large-Scale Parallel Collaborative Filtering for the Netflix Prize , 2008, AAIM.

[2]  Qi Liu,et al.  Quantitatively integrating molecular structure and bioactivity profile evidence into drug-target relationship analysis , 2011, BMC Bioinformatics.

[3]  Henning Hermjakob,et al.  The Reactome pathway Knowledgebase , 2015, Nucleic acids research.

[4]  Bond-Smith Giles,et al.  Only women with symptoms need to have their breast implants removed, says government , 2012 .

[5]  Francisco Melo,et al.  StAR: a simple tool for the statistical comparison of ROC curves , 2008, BMC Bioinformatics.

[6]  S. Choi,et al.  Randomized double-blinded, placebo-controlled phase II trial of simvastatin and gemcitabine in advanced pancreatic cancer patients , 2013, Cancer Chemotherapy and Pharmacology.

[7]  Chao Liu,et al.  Distributed nonnegative matrix factorization for web-scale dyadic data analysis on mapreduce , 2010, WWW '10.

[8]  Marinka Zitnik,et al.  Matrix Factorization-Based Data Fusion for Gene Function Prediction in Baker's Yeast and Slime Mold , 2013, Pacific Symposium on Biocomputing.

[9]  Fillia Makedon,et al.  Learning from Incomplete Ratings Using Non-negative Matrix Factorization , 2006, SDM.

[10]  David S. Wishart,et al.  DrugBank 4.0: shedding new light on drug metabolism , 2013, Nucleic Acids Res..

[11]  H. Sebastian Seung,et al.  Learning the parts of objects by non-negative matrix factorization , 1999, Nature.

[12]  J. Sengupta The Nonparametric Approach , 1989 .

[13]  P. Paatero,et al.  Positive matrix factorization: A non-negative factor model with optimal utilization of error estimates of data values† , 1994 .

[14]  Marinka Zitnik,et al.  Data Fusion by Matrix Factorization , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Susumu Goto,et al.  Data, information, knowledge and principle: back to metabolism in KEGG , 2013, Nucleic Acids Res..

[16]  Hsiang-Yuan Yeh,et al.  Inferring drug-disease associations from integration of chemical, genomic and phenotype data using network propagation , 2013, BMC Medical Genomics.

[17]  Hao Ding,et al.  Collaborative matrix factorization with multiple similarities for predicting drug-target interactions , 2013, KDD.

[18]  Hyunsoo Kim,et al.  Sparse Non-negative Matrix Factorizations via Alternating Non-negativity-constrained Least Squares , 2006 .

[19]  Chang Liu,et al.  Predicting Drug–Target Interactions Using Probabilistic Matrix Factorization , 2013, J. Chem. Inf. Model..

[20]  Feiping Nie,et al.  Predicting Protein-Protein Interactions from Multimodal Biological Data Sources via Nonnegative Matrix Tri-Factorization , 2012, RECOMB.

[21]  E. DeLong,et al.  Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach. , 1988, Biometrics.

[22]  Fan Yang,et al.  Drug-target interaction prediction by integrating chemical, genomic, functional and pharmacological data. , 2013, Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing.

[23]  Wolfgang Schima,et al.  Pancreatic adenocarcinoma , 2006, European Radiology.

[24]  Thomas C. Wiegers,et al.  The Comparative Toxicogenomics Database's 10th year anniversary: update 2015 , 2014, Nucleic Acids Res..

[25]  L. Stein,et al.  Annotating Cancer Variants and Anti-Cancer Therapeutics in Reactome , 2012, Cancers.

[26]  Henning Hermjakob,et al.  The Reactome pathway knowledgebase , 2013, Nucleic Acids Res..

[27]  John P. Overington,et al.  ChEMBL: a large-scale bioactivity database for drug discovery , 2011, Nucleic Acids Res..

[28]  Ergin Soysal,et al.  A weighted and integrated drug-target interactome: drug repurposing for schizophrenia as a use case , 2015, BMC Systems Biology.

[29]  Damian Szklarczyk,et al.  STRING v9.1: protein-protein interaction networks, with increased coverage and integration , 2012, Nucleic Acids Res..