Drug-target interaction prediction by integrating chemical, genomic, functional and pharmacological data.

In silico prediction of unknown drug-target interactions (DTIs) has become a popular tool for drug repositioning and drug development. A key challenge in DTI prediction lies in integrating multiple types of data for accurate DTI prediction. Although recent studies have demonstrated that genomic, chemical and pharmacological data can provide reliable information for DTI prediction, it remains unclear whether functional information on proteins can also contribute to this task. Little work has been developed to combine such information with other data to identify new interactions between drugs and targets. In this paper, we introduce functional data into DTI prediction and construct biological space for targets using the functional similarity measure. We present a probabilistic graphical model, called conditional random field (CRF), to systematically integrate genomic, chemical, functional and pharmacological data plus the topology of DTI networks into a unified framework to predict missing DTIs. Tests on two benchmark datasets show that our method can achieve excellent prediction performance with the area under the precision-recall curve (AUPR) up to 94.9. These results demonstrate that our CRF model can successfully exploit heterogeneous data to capture the latent correlations of DTIs, and thus will be practically useful for drug repositioning. Supplementary Material is available at http://iiis.tsinghua.edu.cn/~compbio/papers/psb2014/psb2014_sm.pdf.

[1]  Zoubin Ghahramani,et al.  Proceedings of the 24th international conference on Machine learning , 2007, ICML 2007.

[2]  Beat Ernst,et al.  Drug discovery today. , 2003, Current topics in medicinal chemistry.

[3]  A. Banerjee Convex Analysis and Optimization , 2006 .

[4]  Nir Friedman,et al.  Probabilistic Graphical Models - Principles and Techniques , 2009 .

[5]  William L. Jorgensen,et al.  Journal of Chemical Information and Modeling , 2005, J. Chem. Inf. Model..

[6]  Geoffrey E. Hinton,et al.  Restricted Boltzmann machines for collaborative filtering , 2007, ICML '07.

[7]  G. Hong,et al.  Nucleic Acids Research , 2015, Nucleic Acids Research.

[8]  Fernando Pereira,et al.  Shallow Parsing with Conditional Random Fields , 2003, NAACL.

[9]  A. Palmer,et al.  Journal of Biomolecular NMR , 2005 .

[10]  Bruce Randall Donald,et al.  Algorithms in Structural Molecular Biology , 2011 .

[11]  Hongbo Deng,et al.  A social recommendation framework based on multi-scale continuous conditional random fields , 2009, CIKM.

[12]  Pei Zhou,et al.  A Markov Random Field Framework for Protein Side-Chain Resonance Assignment , 2010, RECOMB.

[13]  Kiyoko F. Aoki-Kinoshita,et al.  From genomics to chemical genomics: new developments in KEGG , 2005, Nucleic Acids Res..

[14]  김삼묘,et al.  “Bioinformatics” 특집을 내면서 , 2000 .

[15]  R. Rosenfeld Nature , 2009, Otolaryngology--head and neck surgery : official journal of American Academy of Otolaryngology-Head and Neck Surgery.

[16]  Miguel Á. Carreira-Perpiñán,et al.  On Contrastive Divergence Learning , 2005, AISTATS.

[17]  Burr Settles,et al.  Biomedical Named Entity Recognition using Conditional Random Fields and Rich Feature Sets , 2004, NLPBA/BioNLP.

[18]  BMC Bioinformatics , 2005 .

[19]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[20]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[21]  P. Bork,et al.  A side effect resource to capture phenotypic effects of drugs , 2010, Molecular systems biology.

[22]  Tao Qin,et al.  Global Ranking of Documents Using Continuous Conditional Random Fields , 2008 .

[23]  A. Pühler,et al.  Molecular systems biology , 2007 .

[24]  E. Hall,et al.  The nature of biotechnology. , 1988, Journal of biomedical engineering.