Kernel-based data fusion improves the drug-protein interaction prediction

Proteins are involved in almost every action of every organism by interacting with other small molecules including drugs. Computationally predicting the drug-protein interactions is particularly important in speeding up the process of developing novel drugs. To borrow the information from existing drug-protein interactions, we need to define the similarity among proteins and the similarity among drugs. Usually these similarities are defined based on one single data source and many methods have been proposed. However, the availability of many genomic and chemogenomic data sources allows us to integrate these useful data sources to improve the predictions. Thus a great challenge is how to integrate these heterogeneous data sources. Here, we propose a kernel-based method to predict drug-protein interactions by integrating multiple types of data. Specially, we collect drug pharmacological and therapeutic effects, drug chemical structures, and protein genomic information to characterize the drug-target interactions, then integrate them by a kernel function within a support vector machine (SVM)-based predictor. With this data fusion technology, we establish the drug-protein interactions from a collections of data sources. Our new method is validated on four classes of drug target proteins, including enzymes, ion channels (ICs), G-protein couple receptors (GPCRs), and nuclear receptors (NRs). We find that every single data source is predictive and integration of different data sources allows the improvement of accuracy, i.e., data integration can uncover more experimentally observed drug-target interactions upon the same levels of false positive rate than single data source based methods. The functional annotation analysis indicates that our new predictions are worthy of future experimental validation. In conclusion, our new method can efficiently integrate diverse data sources, and will promote the further research in drug discovery.

[1]  Yoshihiro Yamanishi,et al.  Supervised prediction of drug–target interactions using bipartite local models , 2009, Bioinform..

[2]  Stuart L. Schreiber,et al.  Dissecting glucose signalling with diversity-oriented synthesis and small-molecule microarrays , 2002, Nature.

[3]  Alex Smola,et al.  Kernel methods in machine learning , 2007, math/0701907.

[4]  Yong Wang,et al.  Computationally Probing Drug-Protein Interactions Via Support Vector Machine , 2010 .

[5]  Yoshihiro Yamanishi,et al.  Drug-target interaction prediction from chemical, genomic and pharmacological data in an integrated framework , 2010, Bioinform..

[6]  Bernhard Schölkopf,et al.  Support Vector Machine Applications in Computational Biology , 2004 .

[7]  William Stafford Noble,et al.  Kernel methods for predicting protein-protein interactions , 2005, ISMB.

[8]  Michael Gribskov,et al.  Use of Receiver Operating Characteristic (ROC) Analysis to Evaluate Sequence Matching , 1996, Comput. Chem..

[9]  Dekang Lin,et al.  An Information-Theoretic Definition of Similarity , 1998, ICML.

[10]  M S Waterman,et al.  Identification of common molecular subsequences. , 1981, Journal of molecular biology.

[11]  Yoshihiro Yamanishi,et al.  Prediction of drug–target interaction networks from the integration of chemical and genomic spaces , 2008, ISMB.

[12]  M. Kanehisa,et al.  Development of a chemical structure comparison method for integrated analysis of chemical and genomic information in the metabolic pathways. , 2003, Journal of the American Chemical Society.

[13]  Edward Y. Chang,et al.  Class-Boundary Alignment for Imbalanced Dataset Learning , 2003 .

[14]  Nello Cristianini,et al.  Kernel-Based Data Fusion and Its Application to Protein Function Prediction in Yeast , 2003, Pacific Symposium on Biocomputing.

[15]  Shiwen Zhao,et al.  Network-Based Relating Pharmacological and Genomic Spaces for Drug Target Identification , 2010, PloS one.

[16]  S. Haggarty,et al.  Multidimensional chemical genetic analysis of diversity-oriented synthesis-derived deacetylase inhibitors using cell-based assays. , 2003, Chemistry & biology.

[17]  Kiyoko F. Aoki-Kinoshita,et al.  From genomics to chemical genomics: new developments in KEGG , 2005, Nucleic Acids Res..

[18]  David S. Wishart,et al.  DrugBank: a knowledgebase for drugs, drug actions and drug targets , 2007, Nucleic Acids Res..

[19]  Luhua Lai,et al.  Prediction of potential drug targets based on simple sequence properties , 2007, BMC Bioinformatics.

[20]  William Stafford Noble,et al.  Choosing negative examples for the prediction of protein-protein interactions , 2006, BMC Bioinformatics.

[21]  Nai-Yang Deng,et al.  Support Vector Machines: Optimization Based Theory, Algorithms, and Extensions , 2012 .

[22]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[23]  Antje Chang,et al.  New Developments , 2003 .

[24]  Robert B. Russell,et al.  SuperTarget and Matador: resources for exploring drug-target relationships , 2007, Nucleic Acids Res..