Identification of Potential Drug-targets by Combining Evolutionary Information Extracted from Frequency Profiles and Molecular Topological Structures.

Identifying interactions among drug compounds and target proteins are the basis of drug research, and plays a crucial in drug discovery. However, determining drug-target interactions (DTIs) and potential protein-compound interactions by biological experiment based method alone is a very complicated, expensive and time-consuming process. Hence, there is an intense motivation to design in silico prediction methods to overcome these obstacles. In this work, we designed a novel in silico strategy to predict proteome-scale DTIs based on the assumption that DTI pairs can be expressed through the evolutionary information derived from frequency profiles and drugs' structural properties. To achieve this, drug molecules are encoded into the substructure fingerprints to represent certain fragments; target proteins are first converted into Position-Specific Scoring Matrix (PSSM), and then encoded as 2-dimensional Principal Component Analysis (2DPCA) descriptors. In the prediction phase, the feature weighted Rotation Forest (RF) classifier is used to estimate whether drug and target interact with each other on four benchmark datasets, including Enzymes, Ion Channels, GPCRs and Nuclear Receptors. The prediction accuracy of cross-validation on the four datasets is 95.40%, 88.82%, 85.67%, and 82.22%, respectively. In order to have a clearer assessment of the proposed approach, we compared it with the Discrete Cosine Transform (DCT) descriptors model, Support Vector Machine (SVM) classifier model and existing excellent approaches, including DBSI, NetCBP, KBMF2K, SIMCOMP and RFDT. The excellent results of experimental indicated that the proposed approach can effectively improve the DTI prediction accuracy and can be used as a practical tool for the research and design of new drugs. This article is protected by copyright. All rights reserved.

[1]  Stephen H. Bryant,et al.  Improved prediction of drug-target interactions using regularized least squares integrating with kernel fusion technique. , 2016, Analytica chimica acta.

[2]  Stéphanie Pérot,et al.  Insights into an Original Pocket-Ligand Pair Classification: A Promising Tool for Ligand Profile Prediction , 2013, PloS one.

[3]  Alan C. Cheng,et al.  Structure-Based Identification of Small Molecule Binding Sites Using a Free Energy Model , 2006, J. Chem. Inf. Model..

[4]  Zhu-Hong You,et al.  An ensemble approach for large-scale identification of protein-protein interactions using the alignments of multiple sequences , 2016, Oncotarget.

[5]  Alejandro F. Frangi,et al.  Two-dimensional PCA: a new approach to appearance-based face representation and recognition , 2004 .

[6]  A. D. McLachlan,et al.  Profile analysis: detection of distantly related proteins. , 1987, Proceedings of the National Academy of Sciences of the United States of America.

[7]  Zhanchao Li,et al.  Large-scale identification of potential drug targets based on the topological features of human protein-protein interaction network. , 2015, Analytica chimica acta.

[8]  Yong Wang,et al.  Computationally Probing Drug-Protein Interactions Via Support Vector Machine , 2010 .

[9]  K. Chou,et al.  Predicting Drug-Target Interaction Networks Based on Functional Groups and Biological Features , 2010, PloS one.

[10]  Bernhard Schölkopf,et al.  A tutorial on support vector regression , 2004, Stat. Comput..

[11]  Yong Zhou,et al.  An improved efficient rotation forest algorithm to predict the interactions among proteins , 2018, Soft Comput..

[12]  Dong-Sheng Cao,et al.  Large-scale prediction of drug-target interactions using protein sequences and drug topological structures. , 2012, Analytica chimica acta.

[13]  Jens Sadowski,et al.  Comparison of Support Vector Machine and Artificial Neural Network Systems for Drug/Nondrug Classification , 2003, J. Chem. Inf. Comput. Sci..

[14]  Pedro Alexandrino Fernandes,et al.  Protein–ligand docking: Current status and future challenges , 2006, Proteins.

[15]  Arzucan Özgür,et al.  A comparative study of SMILES-based compound similarity functions for drug-target interaction prediction , 2016, BMC Bioinformatics.

[16]  S. Haggarty,et al.  Multidimensional chemical genetic analysis of diversity-oriented synthesis-derived deacetylase inhibitors using cell-based assays. , 2003, Chemistry & biology.

[17]  Kiyoko F. Aoki-Kinoshita,et al.  From genomics to chemical genomics: new developments in KEGG , 2005, Nucleic Acids Res..

[18]  Jian Yang,et al.  From image vector to matrix: a straightforward image projection technique - IMPCA vs. PCA , 2002, Pattern Recognit..

[19]  D. Butina,et al.  Predicting ADME properties in silico: methods and models. , 2002, Drug discovery today.

[20]  Xing Chen,et al.  Drug-target interaction prediction by random walk on the heterogeneous network. , 2012, Molecular bioSystems.

[21]  Michael J. Keiser,et al.  Relating protein pharmacology by ligand chemistry , 2007, Nature Biotechnology.

[22]  Salvatore Alaimo,et al.  Drug–target interaction prediction through domain-tuned network-based inference , 2013, Bioinform..

[23]  J. Gies,et al.  Drugs and their molecular targets: an updated overview , 2008, Fundamental & clinical pharmacology.

[24]  Juan José Rodríguez Diez,et al.  Rotation Forest: A New Classifier Ensemble Method , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[25]  Yoshihiro Yamanishi,et al.  Drug-target interaction prediction from chemical, genomic and pharmacological data in an integrated framework , 2010, Bioinform..

[26]  Daniel R. Caffrey,et al.  Structure-based maximal affinity model predicts small-molecule druggability , 2007, Nature Biotechnology.

[27]  Chuang Liu,et al.  Prediction of Drug-Target Interactions and Drug Repositioning via Network-Based Inference , 2012, PLoS Comput. Biol..

[28]  Mehmet Gönen,et al.  Predicting drug-target interactions from chemical and genomic kernels using Bayesian matrix factorization , 2012, Bioinform..

[29]  John P. Overington,et al.  How many drug targets are there? , 2006, Nature Reviews Drug Discovery.

[30]  Yizeng Liang,et al.  Exploring nonlinear relationships in chemical data using kernel-based methods , 2011 .

[31]  I. Jurisica,et al.  Network-based characterization of drug-regulated genes, drug targets, and toxicity. , 2012, Methods.

[32]  David S. Wishart,et al.  DrugBank: a knowledgebase for drugs, drug actions and drug targets , 2007, Nucleic Acids Res..

[33]  Stuart L. Schreiber,et al.  Dissecting glucose signalling with diversity-oriented synthesis and small-molecule microarrays , 2002, Nature.

[34]  Antje Chang,et al.  BRENDA , the enzyme database : updates and major new developments , 2003 .

[35]  Zhu-Hong You,et al.  RFDT: A Rotation Forest-based Predictor for Predicting Drug-Target Interactions Using Drug Structure and Protein Sequence Information. , 2016, Current protein & peptide science.

[36]  Jie Shen,et al.  Estimation of ADME Properties with Substructure Pattern Recognition , 2010, J. Chem. Inf. Model..

[37]  M. Zweig,et al.  Receiver-operating characteristic (ROC) plots: a fundamental evaluation tool in clinical medicine. , 1993, Clinical chemistry.

[38]  Yoshihiro Yamanishi,et al.  Prediction of drug–target interaction networks from the integration of chemical and genomic spaces , 2008, ISMB.

[39]  Robert B. Russell,et al.  SuperTarget and Matador: resources for exploring drug-target relationships , 2007, Nucleic Acids Res..

[40]  Dong-Sheng Cao,et al.  Prediction of aqueous solubility of druglike organic compounds using partial least squares, back‐propagation network and support vector machine , 2010 .

[41]  Luhua Lai,et al.  Prediction of potential drug targets based on simple sequence properties , 2007, BMC Bioinformatics.

[42]  Hailin Chen,et al.  A Semi-Supervised Method for Drug-Target Interaction Prediction with Consistency in Networks , 2013, PloS one.

[43]  Samuel L. DeLuca,et al.  Small-molecule ligand docking into comparative models with Rosetta , 2013, Nature Protocols.

[44]  Thomas L. Madden,et al.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. , 1997, Nucleic acids research.