Prediction of drug-target interaction based on protein features using undersampling and feature selection techniques with boosting.

Accurate identification of drug-target interaction (DTI) is a crucial and challenging task in the drug discovery process, having enormous benefit to the patients and pharmaceutical company. The traditional wet-lab experiments of DTI is expensive, time-consuming, and labor-intensive. Therefore, many computational techniques have been established for this purpose; although a huge number of interactions are still undiscovered. Here, we present pdti-EssB, a new computational model for identification of DTI using protein sequence and drug molecular structure. More specifically, each drug molecule is transformed as the molecular substructure fingerprint. For a protein sequence, different descriptors are utilized to represent its evolutionary, sequence, and structural information. Besides, our proposed method uses data balancing techniques to handle the imbalance problem and applies a novel feature eliminator to extract the best optimal features for accurate prediction. In this paper, four classes of DTI benchmark datasets are used to construct a predictive model with XGBoost. Here, the auROC is utilized as an evaluation metric to compare the performance of pdti-EssB method with recent methods, applying five-fold cross-validation. Finally, the experimental results indicate that our proposed method is able to outperform other approaches in predicting DTI, and introduces new drug-target interaction samples based on prediction probability scores. pdti-EssB webserver is available online at http://pdtiessb-uestc.com/.

[1]  K. Chou,et al.  Predicting Drug-Target Interaction Networks Based on Functional Groups and Biological Features , 2010, PloS one.

[2]  David S. Wishart,et al.  DrugBank 3.0: a comprehensive resource for ‘Omics’ research on drugs , 2010, Nucleic Acids Res..

[3]  Ali Masoudi-Nejad,et al.  Drug–target interaction prediction via chemogenomic space: learning-based methods , 2014, Expert opinion on drug metabolism & toxicology.

[4]  P. Rosen,et al.  Joint inversion of broadband teleseismic and interferometric synthetic aperture radar (InSAR) data for the slip history of the Mw = 7.7, Nazca ridge (Peru) earthquake of 12 November 1996 , 2003 .

[5]  Faisal Saeed,et al.  Bioactive Molecule Prediction Using Extreme Gradient Boosting , 2016, Molecules.

[6]  Salvatore Alaimo,et al.  Drug–target interaction prediction through domain-tuned network-based inference , 2013, Bioinform..

[7]  James G. Lyons,et al.  SPIDER2: A Package to Predict Secondary Structure, Accessible Surface Area, and Main-Chain Torsional Angles by Deep Neural Networks. , 2017, Methods in molecular biology.

[8]  Andrew L. Hopkins,et al.  Drug discovery: Predicting promiscuity , 2009, Nature.

[9]  Thomas L. Madden,et al.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. , 1997, Nucleic acids research.

[10]  Geoffrey I. Webb,et al.  iLearn : an integrated platform and meta-learner for feature engineering, machine-learning analysis and modeling of DNA, RNA and protein sequence data , 2019, Briefings Bioinform..

[11]  K. Chou A novel approach to predicting protein structural classes in a (20–1)‐D amino acid composition space , 1995, Proteins.

[12]  Antje Chang,et al.  BRENDA , the enzyme database : updates and major new developments , 2003 .

[13]  Zhu-Hong You,et al.  RFDT: A Rotation Forest-based Predictor for Predicting Drug-Target Interactions Using Drug Structure and Protein Sequence Information. , 2016, Current protein & peptide science.

[14]  Alan Wee-Chung Liew,et al.  Sequence-Based Prediction of Protein-Carbohydrate Binding Sites Using Support Vector Machines. , 2016, Journal of chemical information and modeling.

[15]  Xing Chen,et al.  In silico prediction of drug-target interaction networks based on drug chemical structure and protein sequences , 2017, Scientific Reports.

[16]  James G. Lyons,et al.  A feature extraction technique using bi-gram probabilities of position specific scoring matrix for protein fold recognition. , 2013, Journal of theoretical biology.

[17]  K. Chou Prediction of protein cellular attributes using pseudo‐amino acid composition , 2001, Proteins.

[18]  B. Efron,et al.  A Leisurely Look at the Bootstrap, the Jackknife, and , 1983 .

[19]  Michael J. Keiser,et al.  Predicting new molecular targets for known drugs , 2009, Nature.

[20]  Robert B. Russell,et al.  SuperTarget and Matador: resources for exploring drug-target relationships , 2007, Nucleic Acids Res..

[21]  Sahand Khakabimamaghani,et al.  Drug-target interaction prediction from PSSM based evolutionary information. , 2016, Journal of pharmacological and toxicological methods.

[22]  Michael Schroeder,et al.  Pioneering topological methods for network-based drug–target prediction by exploiting a brain-network self-organization theory , 2017, Briefings Bioinform..

[23]  Chee Keong Kwoh,et al.  Drug-target interaction prediction via class imbalance-aware ensemble learning , 2016, BMC Bioinformatics.

[24]  Qingsong Xu,et al.  Rcpi: R/Bioconductor package to generate various descriptors of proteins, compounds and their interactions , 2015, Bioinform..

[25]  Abdollah Dehzangi,et al.  iDTI-ESBoost: Identification of Drug Target Interaction Using Evolutionary and Structural Features with Boosting , 2017, Scientific Reports.

[26]  John B. O. Mitchell The Relationship between the Sequence Identities of Alpha Helical Proteins in the PDB and the Molecular Similarities of Their Ligands , 2001, J. Chem. Inf. Comput. Sci..

[27]  Jian Peng,et al.  A network integration approach for drug-target interaction prediction and computational drug repositioning from heterogeneous information , 2017, Nature Communications.

[28]  Andrzej Kloczkowski,et al.  Prediction of Protein Secondary Structure , 2017, Methods in Molecular Biology.

[29]  Ming Wen,et al.  Deep-Learning-Based Drug-Target Interaction Prediction. , 2017, Journal of proteome research.

[30]  Bin Chen,et al.  PubChem as a Source of Polypharmacology , 2009, J. Chem. Inf. Model..

[31]  George Papadatos,et al.  The ChEMBL bioactivity database: an update , 2013, Nucleic Acids Res..

[32]  Xiaomin Luo,et al.  TarFisDock: a web server for identifying drug targets with docking approach , 2006, Nucleic Acids Res..

[33]  Xing Chen,et al.  Drug-target interaction prediction by random walk on the heterogeneous network. , 2012, Molecular bioSystems.

[34]  Bin Yu,et al.  Predicting drug-target interactions using Lasso with random forest based on evolutionary information and chemical structure. , 2019, Genomics.

[35]  Peng Chen,et al.  DrugRPE: Random projection ensemble approach to drug-target interaction prediction , 2017, Neurocomputing.

[36]  Lu Huang,et al.  Update of TTD: Therapeutic Target Database , 2009, Nucleic Acids Res..

[37]  Abdollah Dehzangi,et al.  iDNAProt-ES: Identification of DNA-binding Proteins Using Evolutionary and Structural Features , 2017, Scientific Reports.

[38]  K. Chou,et al.  iCDI-PseFpt: identify the channel-drug interaction in cellular networking with PseAAC and molecular fingerprints. , 2013, Journal of theoretical biology.

[39]  Dong-Sheng Cao,et al.  propy: a tool to generate various modes of Chou's PseAAC , 2013, Bioinform..

[40]  Jijun Tang,et al.  Identification of drug-target interactions via multiple information integration , 2017, Inf. Sci..

[41]  Michael J. Keiser,et al.  Relating protein pharmacology by ligand chemistry , 2007, Nature Biotechnology.

[42]  Pingzhao Hu,et al.  Predicting drug-target interaction network using deep learning model , 2019, Comput. Biol. Chem..

[43]  Xiao-Ying Yan,et al.  Prediction of drug-target interaction by integrating diverse heterogeneous information source with multiple kernel learning and clustering methods , 2019, Comput. Biol. Chem..

[44]  S. Ranganathan,et al.  PhoglyStruct: Prediction of phosphoglycerylated lysine residues using structural properties of amino acids , 2018, Scientific Reports.

[45]  Lin He,et al.  Exploring Off-Targets and Off-Systems for Adverse Drug Reactions via Chemical-Protein Interactome — Clozapine-Induced Agranulocytosis as a Case Study , 2011, PLoS Comput. Biol..

[46]  Fuhui Long,et al.  Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy , 2003, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[47]  Xue-wen Chen,et al.  On Position-Specific Scoring Matrix for Protein Function Prediction , 2011, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[48]  Dong-Sheng Cao,et al.  Large-scale prediction of drug-target interactions using protein sequences and drug topological structures. , 2012, Analytica chimica acta.

[49]  Yongsheng Liu,et al.  iDTi-CSsmoteB: Identification of Drug–Target Interaction Based on Drug Chemical Structure and Protein Sequence Using XGBoost With Over-Sampling Technique SMOTE , 2019, IEEE Access.

[50]  Yoshihiro Yamanishi,et al.  Drug-target interaction prediction from chemical, genomic and pharmacological data in an integrated framework , 2010, Bioinform..

[51]  Tin Kam Ho,et al.  The Random Subspace Method for Constructing Decision Forests , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[52]  Mehmet Gönen,et al.  Predicting drug-target interactions from chemical and genomic kernels using Bayesian matrix factorization , 2012, Bioinform..

[53]  Yoshihiro Yamanishi,et al.  Supervised prediction of drug–target interactions using bipartite local models , 2009, Bioinform..

[54]  Damian Szklarczyk,et al.  STITCH 5: augmenting protein–chemical interaction networks with tissue and affinity data , 2015, Nucleic Acids Res..

[55]  Xing Chen,et al.  A Systematic Prediction of Drug-Target Interactions Using Molecular Fingerprints and Protein Sequences. , 2018, Current protein & peptide science.

[56]  Geoffrey I. Webb,et al.  iFeature: a Python package and web server for features extraction and selection from protein and peptide sequences , 2018, Bioinform..

[57]  Haifeng Chen,et al.  In Silico Log P Prediction for a Large Data Set with Support Vector Machines, Radial Basis Neural Networks and Multiple Linear Regression , 2009, Chemical biology & drug design.

[58]  Yoshihiro Yamanishi,et al.  Prediction of drug–target interaction networks from the integration of chemical and genomic spaces , 2008, ISMB.

[59]  Yoshihiro Yamanishi,et al.  KEGG for linking genomes to life and the environment , 2007, Nucleic Acids Res..

[60]  Bin Liu,et al.  BioSeq-Analysis: a platform for DNA, RNA and protein sequence analysis based on machine learning approaches , 2019, Briefings Bioinform..

[61]  Tianqi Chen,et al.  XGBoost: A Scalable Tree Boosting System , 2016, KDD.

[62]  P. Bork,et al.  Drug Target Identification Using Side-Effect Similarity , 2008, Science.

[63]  Minzhu Xie,et al.  XGBFEMF: An XGBoost-Based Framework for Essential Protein Prediction , 2018, IEEE Transactions on NanoBioscience.

[64]  Dong-Sheng Cao,et al.  In silico classification of human maximum recommended daily dose based on modified random forest and substructure fingerprint. , 2011, Analytica chimica acta.

[65]  Susumu Goto,et al.  KEGG for integration and interpretation of large-scale molecular data sets , 2011, Nucleic Acids Res..

[66]  Dong-Sheng Cao,et al.  PyBioMed: a python library for various molecular representations of chemicals, proteins and DNAs and their interactions , 2018, Journal of Cheminformatics.

[67]  Jason Weston,et al.  Gene Selection for Cancer Classification using Support Vector Machines , 2002, Machine Learning.

[68]  J. S. Cramer The Origins of Logistic Regression , 2002 .

[69]  Yong Zhou,et al.  Prediction of Drug–Target Interaction Networks from the Integration of Protein Sequences and Drug Chemical Structures , 2017, Molecules.

[70]  Hojung Nam,et al.  SELF-BLM: Prediction of drug-target interactions via self-training SVM , 2017, PloS one.

[71]  Stephen H. Bryant,et al.  Improved prediction of drug-target interactions using regularized least squares integrating with kernel fusion technique. , 2016, Analytica chimica acta.

[72]  M. Mostafizur Rahman,et al.  Cluster Based Under-Sampling for Unbalanced Cardiovascular Data , 2013 .

[73]  Loris Nanni,et al.  A set of descriptors for identifying the protein-drug interaction in cellular networking. , 2014, Journal of theoretical biology.

[74]  Panos Kalnis,et al.  DASPfind: new efficient method to predict drug–target interactions , 2016, Journal of Cheminformatics.

[75]  George C. Runger,et al.  Feature selection via regularized trees , 2012, The 2012 International Joint Conference on Neural Networks (IJCNN).

[76]  Philip E. Bourne,et al.  Drug Discovery Using Chemical Systems Biology: Weak Inhibition of Multiple Kinases May Contribute to the Anti-Cancer Effect of Nelfinavir , 2011, PLoS Comput. Biol..

[77]  Yi Pan,et al.  Predicting drug-target interaction using positive-unlabeled learning , 2016, Neurocomputing.

[78]  Jing Li,et al.  Drug Target Predictions Based on Heterogeneous Graph Inference , 2012, Pacific Symposium on Biocomputing.

[79]  Yue-Shi Lee,et al.  Cluster-based under-sampling approaches for imbalanced data distributions , 2009, Expert Syst. Appl..