Utilizing random Forest QSAR models with optimized parameters for target identification and its application to target-fishing server

BackgroundThe identification of target molecules is important for understanding the mechanism of “target deconvolution” in phenotypic screening and “polypharmacology” of drugs. Because conventional methods of identifying targets require time and cost, in-silico target identification has been considered an alternative solution. One of the well-known in-silico methods of identifying targets involves structure activity relationships (SARs). SARs have advantages such as low computational cost and high feasibility; however, the data dependency in the SAR approach causes imbalance of active data and ambiguity of inactive data throughout targets.ResultsWe developed a ligand-based virtual screening model comprising 1121 target SAR models built using a random forest algorithm. The performance of each target model was tested by employing the ROC curve and the mean score using an internal five-fold cross validation. Moreover, recall rates for top-k targets were calculated to assess the performance of target ranking. A benchmark model using an optimized sampling method and parameters was examined via external validation set. The result shows recall rates of 67.6% and 73.9% for top-11 (1% of the total targets) and top-33, respectively. We provide a website for users to search the top-k targets for query ligands available publicly at http://rfqsar.kaist.ac.kr.ConclusionsThe target models that we built can be used for both predicting the activity of ligands toward each target and ranking candidate targets for a query ligand using a unified scoring scheme. The scores are additionally fitted to the probability so that users can estimate how likely a ligand–target interaction is active. The user interface of our web site is user friendly and intuitive, offering useful information and cross references.

[1]  Lirong Wang,et al.  TargetHunter: An In Silico Target Identification Tool for Predicting Therapeutic Potential of Small Organic Molecules Based on Chemogenomic Database , 2013, The AAPS Journal.

[2]  Tudor I. Oprea,et al.  Target, chemical and bioactivity databases – integration is key , 2006 .

[3]  Ryan G. Coleman,et al.  ZINC: A Free Tool to Discover Chemistry for Biology , 2012, J. Chem. Inf. Model..

[4]  Chang Liu,et al.  Predicting Drug–Target Interactions Using Probabilistic Matrix Factorization , 2013, J. Chem. Inf. Model..

[5]  Tudor I. Oprea,et al.  Drug Repurposing from an Academic Perspective. , 2011, Drug discovery today. Therapeutic strategies.

[6]  Andreas Bender,et al.  Target prediction utilising negative bioactivity data covering large chemical space , 2015, Journal of Cheminformatics.

[7]  G. Terstappen,et al.  Target deconvolution strategies in drug discovery , 2007, Nature Reviews Drug Discovery.

[8]  Andreas Bender,et al.  From in silico target prediction to multi-target drug design: current databases, methods and applications. , 2011, Journal of proteomics.

[9]  R. Iyengar,et al.  Systems approaches to polypharmacology and drug discovery. , 2010, Current opinion in drug discovery & development.

[10]  Michael J. Keiser,et al.  Relating protein pharmacology by ligand chemistry , 2007, Nature Biotechnology.

[11]  Aurélien Grosdidier,et al.  SwissTargetPrediction: a web server for target prediction of bioactive small molecules , 2014, Nucleic Acids Res..

[12]  Zheng Yin,et al.  Improving chemical similarity ensemble approach in target prediction , 2016, Journal of Cheminformatics.

[13]  Yanli Wang,et al.  PubChem: a public information system for analyzing bioactivities of small molecules , 2009, Nucleic Acids Res..

[14]  Andreas Bender,et al.  In Silico Target Predictions: Defining a Benchmarking Data Set and Comparison of Performance of the Multiclass Naïve Bayes and Parzen-Rosenblatt Window , 2013, J. Chem. Inf. Model..

[15]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[16]  Ruth Nussinov,et al.  Structure and dynamics of molecular networks: A novel paradigm of drug discovery. A comprehensive review , 2012, Pharmacology & therapeutics.

[17]  M. V. Regenmortel,et al.  Reductionism and complexity in molecular biology , 2004, HIV/AIDS: Immunochemistry, Reductionism and Vaccine Design.

[18]  Jie Dong,et al.  TargetNet: a web service for predicting potential drug–target interaction profiling via multi-target SAR models , 2016, Journal of Computer-Aided Molecular Design.

[19]  Antonio Lavecchia,et al.  Machine-learning approaches in drug discovery: methods and applications. , 2015, Drug discovery today.

[20]  Antonio Lavecchia,et al.  In silico methods to address polypharmacology: current status, applications and future perspectives. , 2016, Drug discovery today.

[21]  Z. Deng,et al.  Bridging chemical and biological space: "target fishing" using 2D and 3D molecular descriptors. , 2006, Journal of medicinal chemistry.

[22]  George Papadatos,et al.  The ChEMBL bioactivity database: an update , 2013, Nucleic Acids Res..

[23]  A. Hopkins Network pharmacology: the next paradigm in drug discovery. , 2008, Nature chemical biology.

[24]  C. Chong,et al.  New uses for old drugs , 2007, Nature.

[25]  Haibo He,et al.  Learning from Imbalanced Data , 2009, IEEE Transactions on Knowledge and Data Engineering.

[26]  Péter Csizmadia,et al.  MarvinSketch and MarvinView: Molecule Applets for the World Wide Web , 1999 .

[27]  M. Bogyo,et al.  Target deconvolution techniques in modern phenotypic profiling. , 2013, Current opinion in chemical biology.

[28]  J. Jenkins,et al.  Prediction of Biological Targets for Compounds Using Multiple‐Category Bayesian Models Trained on Chemogenomics Databases. , 2006 .

[29]  Alexander Tropsha,et al.  Trust, But Verify: On the Importance of Chemical Structure Curation in Cheminformatics and QSAR Modeling Research , 2010, J. Chem. Inf. Model..

[30]  Y. Martin,et al.  Do structurally similar molecules have similar biological activity? , 2002, Journal of medicinal chemistry.

[31]  Chris Morley,et al.  Open Babel: An open chemical toolbox , 2011, J. Cheminformatics.

[32]  Herbert Waldmann,et al.  Target identification for small bioactive molecules: finding the needle in the haystack. , 2013, Angewandte Chemie.

[33]  G. V. Paolini,et al.  Global mapping of pharmacological space , 2006, Nature Biotechnology.