Improving compound-protein interaction prediction by building up highly credible negative samples.

MOTIVATION Computational prediction of compound-protein interactions (CPIs) is of great importance for drug design and development, as genome-scale experimental validation of CPIs is not only time-consuming but also prohibitively expensive. With the availability of an increasing number of validated interactions, the performance of computational prediction approaches is severely impended by the lack of reliable negative CPI samples. A systematic method of screening reliable negative sample becomes critical to improving the performance of in silico prediction methods. RESULTS This article aims at building up a set of highly credible negative samples of CPIs via an in silico screening method. As most existing computational models assume that similar compounds are likely to interact with similar target proteins and achieve remarkable performance, it is rational to identify potential negative samples based on the converse negative proposition that the proteins dissimilar to every known/predicted target of a compound are not much likely to be targeted by the compound and vice versa. We integrated various resources, including chemical structures, chemical expression profiles and side effects of compounds, amino acid sequences, protein-protein interaction network and functional annotations of proteins, into a systematic screening framework. We first tested the screened negative samples on six classical classifiers, and all these classifiers achieved remarkably higher performance on our negative samples than on randomly generated negative samples for both human and Caenorhabditis elegans. We then verified the negative samples on three existing prediction models, including bipartite local model, Gaussian kernel profile and Bayesian matrix factorization, and found that the performances of these models are also significantly improved on the screened negative samples. Moreover, we validated the screened negative samples on a drug bioactivity dataset. Finally, we derived two sets of new interactions by training an support vector machine classifier on the positive interactions annotated in DrugBank and our screened negative interactions. The screened negative samples and the predicted interactions provide the research community with a useful resource for identifying new drug targets and a helpful supplement to the current curated compound-protein databases. AVAILABILITY Supplementary files are available at: http://admis.fudan.edu.cn/negative-cpi/.

[1]  Markus S. Kesselring,et al.  Combining , 2020, Aegean Bronze Age Art.

[2]  Hao Ding,et al.  Similarity-based machine learning methods for predicting drug-target interactions: a brief review , 2014, Briefings Bioinform..

[3]  Yoshihiro Yamanishi,et al.  DINIES: drug–target interaction network inference engine based on supervised analysis , 2014, Nucleic Acids Res..

[4]  Tapio Pahikkala,et al.  Toward more realistic drug^target interaction predictions , 2014 .

[5]  Diego di Bernardo,et al.  Mantra 2.0: an online collaborative resource for drug mode of action and repurposing by network analysis , 2014, Bioinform..

[6]  S. Jaeger,et al.  Causal Network Models for Predicting Compound Targets and Driving Pathways in Cancer , 2014, Journal of biomolecular screening.

[7]  Yasuo Tabei,et al.  Scalable prediction of compound-protein interactions using minwise hashing , 2013, BMC Systems Biology.

[8]  Damian Szklarczyk,et al.  STITCH 4: integration of protein–chemical interactions with user data , 2013, Nucleic Acids Res..

[9]  E. Birney,et al.  Pfam: the protein families database , 2013, Nucleic Acids Res..

[10]  E. Marchiori,et al.  Predicting Drug-Target Interactions for New Drug Compounds Using a Weighted Nearest Neighbor Profile , 2013, PloS one.

[11]  Yuhao Wang,et al.  Predicting drug-target interactions using restricted Boltzmann machines , 2013, Bioinform..

[12]  Salvatore Alaimo,et al.  Drug–target interaction prediction through domain-tuned network-based inference , 2013, Bioinform..

[13]  Hailin Chen,et al.  A Semi-Supervised Method for Drug-Target Interaction Prediction with Consistency in Networks , 2013, PloS one.

[14]  Damian Szklarczyk,et al.  STRING v9.1: protein-protein interaction networks, with increased coverage and integration , 2012, Nucleic Acids Res..

[15]  Mehmet Gönen,et al.  Predicting drug-target interactions from chemical and genomic kernels using Bayesian matrix factorization , 2012, Bioinform..

[16]  Yoshihiro Yamanishi,et al.  Relating drug–protein interaction network with drug side effects , 2012, Bioinform..

[17]  R. Sharan,et al.  INDI: a computational framework for inferring drug interactions and their associated recommendations , 2012, Molecular systems biology.

[18]  Jürgen Bajorath,et al.  Extending the Activity Cliff Concept: Structural Categorization of Activity Cliffs and Systematic Identification of Different Types of Cliffs in the ChEMBL Database , 2012, J. Chem. Inf. Model..

[19]  Chuang Liu,et al.  Prediction of Drug-Target Interactions and Drug Repositioning via Network-Based Inference , 2012, PLoS Comput. Biol..

[20]  Anders Wallqvist,et al.  Classification of scaffold-hopping approaches. , 2012, Drug discovery today.

[21]  Chunhua Zhang,et al.  Kernel-based data fusion improves the drug-protein interaction prediction , 2011, Comput. Biol. Chem..

[22]  Elena Marchiori,et al.  Gaussian interaction profile kernels for predicting drug-target interaction , 2011, Bioinform..

[23]  Mindy I. Davis,et al.  Comprehensive analysis of kinase inhibitor selectivity , 2011, Nature Biotechnology.

[24]  Dana Pe'er,et al.  Modulatory profiling identifies mechanisms of small molecule-induced cell death , 2011, Proceedings of the National Academy of Sciences.

[25]  R. Sharan,et al.  PREDICT: a method for inferring novel drug indications with application to personalized medicine , 2011, Molecular systems biology.

[26]  Yoshihiro Yamanishi,et al.  Predicting drug side-effect profiles: a chemical fragment-based approach , 2011, BMC Bioinformatics.

[27]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[28]  P. Hajduk,et al.  Navigating the kinome. , 2011, Nature chemical biology.

[29]  H. Yabuuchi,et al.  Analysis of multiple compound–protein interactions reveals novel bioactive molecules , 2011, Molecular systems biology.

[30]  Xiaobo Zhou,et al.  Semi-supervised drug-protein interaction prediction from heterogeneous biological spaces , 2010, BMC Systems Biology.

[31]  R. Tagliaferri,et al.  Discovery of drug mode of action and drug repositioning from transcriptional responses , 2010, Proceedings of the National Academy of Sciences.

[32]  Yoshihiro Yamanishi,et al.  Drug-target interaction prediction from chemical, genomic and pharmacological data in an integrated framework , 2010, Bioinform..

[33]  K. Chou,et al.  Predicting Drug-Target Interaction Networks Based on Functional Groups and Biological Features , 2010, PloS one.

[34]  P. Bork,et al.  A side effect resource to capture phenotypic effects of drugs , 2010, Molecular systems biology.

[35]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[36]  Yoshihiro Yamanishi,et al.  Supervised prediction of drug–target interactions using bipartite local models , 2009, Bioinform..

[37]  Jean-Philippe Vert,et al.  Protein-ligand interaction prediction: an improved chemogenomics approach , 2008, Bioinform..

[38]  P. Bork,et al.  Drug Target Identification Using Side-Effect Similarity , 2008, Science.

[39]  Yoshihiro Yamanishi,et al.  Prediction of drug–target interaction networks from the integration of chemical and genomic spaces , 2008, ISMB.

[40]  Chih-Jen Lin,et al.  LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[41]  David S. Wishart,et al.  DrugBank: a knowledgebase for drugs, drug actions and drug targets , 2007, Nucleic Acids Res..

[42]  Yi-Cheng Zhang,et al.  Bipartite network projection and personal recommendation. , 2007, Physical review. E, Statistical, nonlinear, and soft matter physics.

[43]  Robert B. Russell,et al.  SuperTarget and Matador: resources for exploring drug-target relationships , 2007, Nucleic Acids Res..

[44]  Mário J. Silva,et al.  Measuring semantic similarity between Gene Ontology terms , 2007, Data Knowl. Eng..

[45]  Paul A Clemons,et al.  The Connectivity Map: Using Gene-Expression Signatures to Connect Small Molecules, Genes, and Disease , 2006, Science.

[46]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[47]  M S Waterman,et al.  Identification of common molecular subsequences. , 1981, Journal of molecular biology.

[48]  Chee Keong Kwoh,et al.  Drug-target interaction prediction by learning from local information and neighbors , 2013, Bioinform..

[49]  Philippe Vayer,et al.  Toward in silico structure-based ADMET prediction in drug discovery. , 2012, Drug discovery today.

[50]  Monika Lessl,et al.  Ernst Schering Research Foundation Workshop , 2010 .

[51]  Brad T. Sherman,et al.  Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources , 2008, Nature Protocols.

[52]  R J Harvey,et al.  Donepezil for dementia due to Alzheimer's disease. , 2006, The Cochrane database of systematic reviews.