A comprehensive review of feature based methods for drug target interaction prediction

Drug target interaction is a prominent research area in the field of drug discovery. It refers to the recognition of interactions between chemical compounds and the protein targets in the human body. Wet lab experiments to identify these interactions are expensive as well as time consuming. The computational methods of interaction prediction help limit the search space for these experiments. These computational methods can be divided into ligand based approaches, docking approaches and chemogenomic approaches. In this review, we aim to describe the various feature based chemogenomic methods for drug target interaction prediction. It provides a comprehensive overview of the various techniques, datasets, tools and metrics. The feature based methods have been categorized, explained and compared. A novel framework for drug target interaction prediction has also been proposed that aims to improve the performance of existing methods. To the best of our knowledge, this is the first comprehensive review focusing only on feature based methods of drug target interaction.

[1]  E. Marcotte,et al.  Prioritizing candidate disease genes by network-based boosting of genome-wide association data. , 2011, Genome research.

[2]  Andrew L. Hopkins,et al.  Drug discovery: Predicting promiscuity , 2009, Nature.

[3]  Ioannis Xenarios,et al.  DIP, the Database of Interacting Proteins: a research tool for studying cellular networks of protein interactions , 2002, Nucleic Acids Res..

[4]  Paul Geladi,et al.  Principal Component Analysis , 1987, Comprehensive Chemometrics.

[5]  Chris Morley,et al.  Open Babel: An open chemical toolbox , 2011, J. Cheminformatics.

[6]  Gerard Pujadas,et al.  Protein-ligand Docking: A Review of Recent Advances and Future Perspectives , 2008 .

[7]  Andreas Bender,et al.  Similarity Searching of Chemical Databases Using Atom Environment Descriptors (MOLPRINT 2D): Evaluation of Performance , 2004, J. Chem. Inf. Model..

[8]  C. Braak Canonical Correspondence Analysis: A New Eigenvector Technique for Multivariate Direct Gradient Analysis , 1986 .

[9]  R D Appel,et al.  Protein identification and analysis tools in the ExPASy server. , 1999, Methods in molecular biology.

[10]  Joel Dudley,et al.  Exploiting drug-disease relationships for computational drug repositioning , 2011, Briefings Bioinform..

[11]  Andreas Bender,et al.  Computational approaches in cheminformatics and bioinformatics , 2012 .

[12]  Xin Chen,et al.  Effect of Molecular Descriptor Feature Selection in Support Vector Machine Classification of Pharmacokinetic and Toxicological Properties of Chemical Agents , 2004, J. Chem. Inf. Model..

[13]  Xing Chen,et al.  A Systematic Prediction of Drug-Target Interactions Using Molecular Fingerprints and Protein Sequences. , 2018, Current protein & peptide science.

[14]  Yoshihiro Yamanishi,et al.  Extracting Sets of Chemical Substructures and Protein Domains Governing Drug-Target Interactions , 2011, J. Chem. Inf. Model..

[15]  Mathura S Venkatarajan,et al.  New quantitative descriptors of amino acids based on multidimensional scaling of a large number of physical–chemical properties , 2001 .

[16]  Kuo-Chen Chou,et al.  Predicting networking couples for metabolic pathways of Arabidopsis , 2006 .

[17]  David S. Wishart,et al.  DrugBank 4.0: shedding new light on drug metabolism , 2013, Nucleic Acids Res..

[18]  Dong-Sheng Cao,et al.  protr/ProtrWeb: R package and web server for generating various numerical representation schemes of protein sequences , 2015, Bioinform..

[19]  K. Thongprasom,et al.  Interventions for treating oral lichen planus. , 2011, The Cochrane database of systematic reviews.

[20]  Alexander E. Ivliev,et al.  Drug Target Prediction and Repositioning Using an Integrated Network-Based Approach , 2013, PloS one.

[21]  Jean-Loup Faulon,et al.  Predicting protein-protein interactions using signature products , 2005, Bioinform..

[22]  Yasuo Tabei,et al.  Scalable prediction of compound-protein interactions using minwise hashing , 2013, BMC Systems Biology.

[23]  Stephen T. C. Wong,et al.  Toward better drug repositioning: prioritizing and integrating existing methods into efficient pipelines. , 2014, Drug discovery today.

[24]  Paul A Clemons,et al.  The Connectivity Map: Using Gene-Expression Signatures to Connect Small Molecules, Genes, and Disease , 2006, Science.

[25]  Michael E. Tipping Sparse Bayesian Learning and the Relevance Vector Machine , 2001, J. Mach. Learn. Res..

[26]  Kuo-Chen Chou,et al.  Prediction of G-protein-coupled receptor classes. , 2005, Journal of proteome research.

[27]  Yongdong Zhang,et al.  Drug-target interaction prediction: databases, web servers and computational models , 2016, Briefings Bioinform..

[28]  Qingsong Xu,et al.  Rcpi: R/Bioconductor package to generate various descriptors of proteins, compounds and their interactions , 2015, Bioinform..

[29]  A. Barabasi,et al.  Drug—target network , 2007, Nature Biotechnology.

[30]  Philip Resnik,et al.  Semantic Similarity in a Taxonomy: An Information-Based Measure and its Application to Problems of Ambiguity in Natural Language , 1999, J. Artif. Intell. Res..

[31]  S. D. Jong SIMPLS: an alternative approach to partial least squares regression , 1993 .

[32]  Jean-Loup Faulon,et al.  Genome scale enzyme–metabolite and drug–target interaction predictions using the signature molecular descriptor , 2008 .

[33]  Hailin Chen,et al.  A Semi-Supervised Method for Drug-Target Interaction Prediction with Consistency in Networks , 2013, PloS one.

[34]  Yong Zhou,et al.  Prediction of Drug–Target Interaction Networks from the Integration of Protein Sequences and Drug Chemical Structures , 2017, Molecules.

[35]  P. Bork,et al.  A side effect resource to capture phenotypic effects of drugs , 2010, Molecular systems biology.

[36]  Ali Masoudi-Nejad,et al.  Drug–target interaction prediction via chemogenomic space: learning-based methods , 2014, Expert opinion on drug metabolism & toxicology.

[37]  Jens Sadowski,et al.  Comparison of Support Vector Machine and Artificial Neural Network Systems for Drug/Nondrug Classification , 2003, J. Chem. Inf. Comput. Sci..

[38]  Damian Szklarczyk,et al.  STITCH 4: integration of protein–chemical interactions with user data , 2013, Nucleic Acids Res..

[39]  James M. Keller,et al.  A fuzzy K-nearest neighbor algorithm , 1985, IEEE Transactions on Systems, Man, and Cybernetics.

[40]  Hans-Peter Kriegel,et al.  Protein function prediction via graph kernels , 2005, ISMB.

[41]  Sanjay Joshua Swamidass,et al.  Mining small-molecule screens to repurpose drugs , 2011, Briefings Bioinform..

[42]  I Roterman,et al.  Two-intermediate model to characterize the structure of fast-folding proteins. , 2011, Journal of theoretical biology.

[43]  Jason Weston,et al.  Mismatch string kernels for discriminative protein classification , 2004, Bioinform..

[44]  Chee-Keong Kwoh,et al.  Computational prediction of drug-target interactions using chemogenomic approaches: an empirical survey , 2019, Briefings Bioinform..

[45]  Sampsa Hautaniemi,et al.  Fast Gene Ontology based clustering for microarray experiments , 2008, BioData Mining.

[46]  James A. Evans,et al.  Novel opportunities for computational biology and sociology in drug discovery. , 2010, Trends in biotechnology.

[47]  Pierre Baldi,et al.  Graph kernels for chemical informatics , 2005, Neural Networks.

[48]  Charles A. Micchelli,et al.  Learning Multiple Tasks with Kernel Methods , 2005, J. Mach. Learn. Res..

[49]  K. Chou,et al.  iGPCR-Drug: A Web Server for Predicting Interaction between GPCRs and Drugs in Cellular Networking , 2013, PloS one.

[50]  Jean-Loup Faulon,et al.  Stochastic Generator of Chemical Structure. 1. Application to the Structure Elucidation of Large Molecules , 1994, Journal of chemical information and computer sciences.

[51]  Gary M. Weiss Mining with rarity: a unifying framework , 2004, SKDD.

[52]  Philip E. Bourne,et al.  SuperTarget goes quantitative: update on drug–target interactions , 2011, Nucleic Acids Res..

[53]  A. Skrbo,et al.  [Classification of drugs using the ATC system (Anatomic, Therapeutic, Chemical Classification) and the latest changes]. , 2004, Medicinski arhiv.

[54]  Marvin Johnson,et al.  Concepts and applications of molecular similarity , 1990 .

[55]  R. Krauss,et al.  When good drugs go bad , 2007, Nature.

[56]  Laetitia Martin-Chanas,et al.  Identify drug repurposing candidates by mining the Protein Data Bank , 2011, Briefings Bioinform..

[57]  Xiaomin Luo,et al.  PDTD: a web-accessible protein database for drug target identification , 2008, BMC Bioinformatics.

[58]  Mahesh Pal,et al.  Random forest classifier for remote sensing classification , 2005 .

[59]  S. Opella,et al.  Structure determination of membrane proteins by nuclear magnetic resonance spectroscopy. , 2013, Annual review of analytical chemistry.

[60]  Robert B. Russell,et al.  SuperTarget and Matador: resources for exploring drug-target relationships , 2007, Nucleic Acids Res..

[61]  Pierre Geurts,et al.  Extremely randomized trees , 2006, Machine Learning.

[62]  Kuo-Chen Chou,et al.  Molecular modeling of two CYP2C19 SNPs and its implications for personalized drug design. , 2008, Protein and peptide letters.

[63]  Xiaomin Luo,et al.  TarFisDock: a web server for identifying drug targets with docking approach , 2006, Nucleic Acids Res..

[64]  Joyce A. Mitchell,et al.  Using literature-based discovery to identify disease candidate genes , 2005, Int. J. Medical Informatics.

[65]  Alexander J. Smola,et al.  Learning with Kernels: support vector machines, regularization, optimization, and beyond , 2001, Adaptive computation and machine learning series.

[66]  Tomaso A. Poggio,et al.  Regularization Networks and Support Vector Machines , 2000, Adv. Comput. Math..

[67]  Manuela Pavan,et al.  DRAGON SOFTWARE: AN EASY APPROACH TO MOLECULAR DESCRIPTOR CALCULATIONS , 2006 .

[68]  David A. Gough,et al.  Predicting protein-protein interactions from primary structure , 2001, Bioinform..

[69]  Chih-Fong Tsai,et al.  SVM and SVM Ensembles in Breast Cancer Prediction , 2017, PloS one.

[70]  Dong-Sheng Cao,et al.  propy: a tool to generate various modes of Chou's PseAAC , 2013, Bioinform..

[71]  Min Wu,et al.  Drug-target interaction prediction using ensemble learning and dimensionality reduction. , 2017, Methods.

[72]  J. Nutt,et al.  Parkinson's disease and LRRK2: Frequency of a common mutation in U.S. movement disorder clinics , 2006, Movement disorders : official journal of the Movement Disorder Society.

[73]  Kuo-Chen Chou,et al.  Pharmacogenomics and personalized use of drugs. , 2008, Current topics in medicinal chemistry.

[74]  David S. Roos,et al.  TDR Targets: a chemogenomics resource for neglected diseases , 2011, Nucleic Acids Res..

[75]  D. Butina,et al.  Predicting ADME properties in silico: methods and models. , 2002, Drug discovery today.

[76]  Robert D. Carr,et al.  The Signature Molecular Descriptor. 4. Canonizing Molecules Using Extended Valence Sequences , 2004, J. Chem. Inf. Model..

[77]  De-Shuang Huang,et al.  Predicting protein–protein interactions from sequence using correlation coefficient and high-quality interaction dataset , 2010, Amino Acids.

[78]  John P. Overington,et al.  ChEMBL: a large-scale bioactivity database for drug discovery , 2011, Nucleic Acids Res..

[79]  María Martín,et al.  UniProt: A hub for protein information , 2015 .

[80]  Charles C. Persinger,et al.  How to improve R&D productivity: the pharmaceutical industry's grand challenge , 2010, Nature Reviews Drug Discovery.

[81]  William Stafford Noble,et al.  Learning to predict protein-protein interactions from protein sequences , 2003, Bioinform..

[82]  Xiaoqi Zheng,et al.  Predicting subcellular location of apoptosis proteins with pseudo amino acid composition: approach from amino acid substitution matrix and auto covariance transformation , 2012, Amino Acids.

[83]  Antje Chang,et al.  BRENDA , the enzyme database : updates and major new developments , 2003 .

[84]  H. Yabuuchi,et al.  Analysis of multiple compound–protein interactions reveals novel bioactive molecules , 2011, Molecular systems biology.

[85]  Zhu-Hong You,et al.  RFDT: A Rotation Forest-based Predictor for Predicting Drug-Target Interactions Using Drug Structure and Protein Sequence Information. , 2016, Current protein & peptide science.

[86]  Jie Shen,et al.  Estimation of ADME Properties with Substructure Pattern Recognition , 2010, J. Chem. Inf. Model..

[87]  Kara Dolinski,et al.  The BioGRID Interaction Database: 2008 update , 2008, Nucleic Acids Res..

[88]  Chris H. Q. Ding,et al.  PSoL: a positive sample only learning algorithm for finding non-coding RNA genes , 2006, Bioinform..

[89]  Yoshihiro Yamanishi,et al.  Prediction of drug–target interaction networks from the integration of chemical and genomic spaces , 2008, ISMB.

[90]  Daniel R. Caffrey,et al.  Structure-based maximal affinity model predicts small-molecule druggability , 2007, Nature Biotechnology.

[91]  S F Altschul,et al.  Iterated profile searches with PSI-BLAST--a tool for discovery in protein databases. , 1998, Trends in biochemical sciences.

[92]  R. Tibshirani,et al.  A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis. , 2009, Biostatistics.

[93]  Diego di Bernardo,et al.  Identifying Network of Drug Mode of Action by Gene Expression Profiling , 2009, J. Comput. Biol..

[94]  Michael J. Keiser,et al.  Relating protein pharmacology by ligand chemistry , 2007, Nature Biotechnology.

[95]  Roded Sharan,et al.  Combining Drug and Gene Similarity Measures for Drug-Target Elucidation , 2011, J. Comput. Biol..

[96]  H. Lehrach,et al.  A Human Protein-Protein Interaction Network: A Resource for Annotating the Proteome , 2005, Cell.

[97]  Z. R. Li,et al.  Update of PROFEAT: a web server for computing structural and physicochemical features of proteins and peptides from amino acid sequence , 2006, Nucleic Acids Res..

[98]  E. Birney,et al.  Pfam: the protein families database , 2013, Nucleic Acids Res..

[99]  Mikhail Belkin,et al.  Laplacian Eigenmaps and Spectral Techniques for Embedding and Clustering , 2001, NIPS.

[100]  Temple F. Smith,et al.  The statistical distribution of nucleic acid similarities. , 1985, Nucleic acids research.

[101]  E. Webb Enzyme nomenclature 1992. Recommendations of the Nomenclature Committee of the International Union of Biochemistry and Molecular Biology on the Nomenclature and Classification of Enzymes. , 1992 .

[102]  Haibo He,et al.  Learning from Imbalanced Data , 2009, IEEE Transactions on Knowledge and Data Engineering.

[103]  Igor V. Pletnev,et al.  Drug Discovery Using Support Vector Machines. The Case Studies of Drug-likeness, Agrochemical-likeness, and Enzyme Inhibition Predictions , 2003, J. Chem. Inf. Comput. Sci..

[104]  Pierre Baldi,et al.  Kernels for small molecules and the prediction of mutagenicity, toxicity and anti-cancer activity , 2005, ISMB.

[105]  T. Takenaka,et al.  Classical vs reverse pharmacology in drug discovery , 2001, BJU international.

[106]  Satoshi Niijima,et al.  GLIDA: GPCR—ligand database for chemical genomics drug discovery—database and tools update , 2007, Nucleic Acids Res..

[107]  Yoshua Bengio,et al.  Random Search for Hyper-Parameter Optimization , 2012, J. Mach. Learn. Res..

[108]  Hua Yu,et al.  A Systematic Prediction of Multiple Drug-Target Interactions from Chemical, Genomic, and Pharmacological Data , 2012, PloS one.

[109]  Hiroki Kobayashi,et al.  Integrating Statistical Predictions and Experimental Verifications for Enhancing Protein-Chemical Interaction Predictions in Virtual Screening , 2009, PLoS Comput. Biol..

[110]  E. Marchiori,et al.  Predicting Drug-Target Interactions for New Drug Compounds Using a Weighted Nearest Neighbor Profile , 2013, PloS one.

[111]  Yasuo Tabei,et al.  Identification of chemogenomic features from drug–target interaction networks using interpretable classifiers , 2012, Bioinform..

[112]  Dmitrii V. Tchekhovskoi,et al.  The critical evaluation of a comprehensive mass spectral library , 1999, Journal of the American Society for Mass Spectrometry.

[113]  Chee Keong Kwoh,et al.  Drug-target interaction prediction via class imbalance-aware ensemble learning , 2016, BMC Bioinformatics.

[114]  Yasubumi Sakakibara,et al.  Statistical prediction of protein-chemical interactions based on chemical structure and mass spectrometry data , 2007, Bioinform..

[115]  Xin Chen,et al.  DCDB: Drug combination database , 2010, Bioinform..

[116]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[117]  D. Cox The Regression Analysis of Binary Sequences , 1958 .

[118]  Pradipta Maji,et al.  RelSim: An integrated method to identify disease genes using gene expression profiles and PPIN based similarity measure , 2017, Inf. Sci..

[119]  Yanli Wang,et al.  PubChem: Integrated Platform of Small Molecules and Biological Activities , 2008 .

[120]  Robert Tibshirani,et al.  1-norm Support Vector Machines , 2003, NIPS.

[121]  Alan M. Frieze,et al.  Min-Wise Independent Permutations , 2000, J. Comput. Syst. Sci..

[122]  Michael J. Keiser,et al.  Large Scale Prediction and Testing of Drug Activity on Side-Effect Targets , 2012, Nature.

[123]  S. L. Wong,et al.  Towards a proteome-scale map of the human protein–protein interaction network , 2005, Nature.

[124]  Susumu Goto,et al.  KEGG: Kyoto Encyclopedia of Genes and Genomes , 2000, Nucleic Acids Res..

[125]  K. Chou Prediction of protein cellular attributes using pseudo‐amino acid composition , 2001, Proteins.

[126]  CHUN WEI YAP,et al.  PaDEL‐descriptor: An open source software to calculate molecular descriptors and fingerprints , 2011, J. Comput. Chem..

[127]  Kuo-Chen Chou,et al.  iNR-PhysChem: A Sequence-Based Predictor for Identifying Nuclear Receptors and Their Subfamilies via Physical-Chemical Property Matrix , 2012, PloS one.

[128]  A. del Sol,et al.  Prediction of disease–gene–drug relationships following a differential network analysis , 2016, Cell Death and Disease.

[129]  Feng Xu,et al.  Therapeutic target database update 2014: a resource for targeted therapeutics , 2013, Nucleic Acids Res..

[130]  A. D. McLachlan,et al.  Profile analysis: detection of distantly related proteins. , 1987, Proceedings of the National Academy of Sciences of the United States of America.

[131]  Chih-Jen Lin,et al.  Training and Testing Low-degree Polynomial Data Mappings via Linear SVM , 2010, J. Mach. Learn. Res..

[132]  Jean-Philippe Vert,et al.  Protein-ligand interaction prediction: an improved chemogenomics approach , 2008, Bioinform..

[133]  Fuhui Long,et al.  Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy , 2003, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[134]  Thomas G. Dietterich Multiple Classifier Systems , 2000, Lecture Notes in Computer Science.

[135]  Gajendra P. S. Raghava,et al.  COPid: Composition Based Protein Identification , 2008, Silico Biol..

[136]  James Green,et al.  ProtDCal: A program to compute general-purpose-numerical descriptors for sequences and 3D-structures of proteins , 2015, BMC Bioinformatics.

[137]  Marcel J. T. Reinders,et al.  SPiCE: a web-based tool for sequence-based protein classification and exploration , 2014, BMC Bioinformatics.

[138]  Zhi-Hua Zhou,et al.  Ensemble Methods: Foundations and Algorithms , 2012 .

[139]  Christopher M. Bishop,et al.  Robust Bayesian Mixture Modelling , 2005, ESANN.

[140]  Andrew K Godwin,et al.  Response markers and the molecular mechanisms of action of Gleevec in gastrointestinal stromal tumors. , 2003, Molecular cancer therapeutics.

[141]  K. Chou,et al.  Predicting Drug-Target Interaction Networks Based on Functional Groups and Biological Features , 2010, PloS one.

[142]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[143]  Dong-Sheng Cao,et al.  PyDPI: Freely Available Python Package for Chemoinformatics, Bioinformatics, and Chemogenomics Studies , 2013, J. Chem. Inf. Model..

[144]  Jin-jian Lu,et al.  Multi-Target Drugs: The Trend of Drug Research and Development , 2012, PloS one.

[145]  Roded Sharan,et al.  An Algorithmic Framework for Predicting Side-Effects of Drugs , 2010, RECOMB.

[146]  Yoshihiro Yamanishi,et al.  Predicting drug side-effect profiles: a chemical fragment-based approach , 2011, BMC Bioinformatics.

[147]  M. Moran,et al.  Large-scale mapping of human protein–protein interactions by mass spectrometry , 2007, Molecular systems biology.

[148]  Kuo-Chen Chou,et al.  GPCR-2L: predicting G protein-coupled receptors and their types by hybridizing two different modes of pseudo amino acid compositions. , 2011, Molecular bioSystems.

[149]  Dong-Sheng Cao,et al.  ChemDes: an integrated web-based platform for molecular descriptor and fingerprint computation , 2015, Journal of Cheminformatics.

[150]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.