Benchmarking Ligand-Based Virtual High-Throughput Screening with the PubChem Database

With the rapidly increasing availability of High-Throughput Screening (HTS) data in the public domain, such as the PubChem database, methods for ligand-based computer-aided drug discovery (LB-CADD) have the potential to accelerate and reduce the cost of probe development and drug discovery efforts in academia. We assemble nine data sets from realistic HTS campaigns representing major families of drug target proteins for benchmarking LB-CADD methods. Each data set is public domain through PubChem and carefully collated through confirmation screens validating active compounds. These data sets provide the foundation for benchmarking a new cheminformatics framework BCL::ChemInfo, which is freely available for non-commercial use. Quantitative structure activity relationship (QSAR) models are built using Artificial Neural Networks (ANNs), Support Vector Machines (SVMs), Decision Trees (DTs), and Kohonen networks (KNs). Problem-specific descriptor optimization protocols are assessed including Sequential Feature Forward Selection (SFFS) and various information content measures. Measures of predictive power and confidence are evaluated through cross-validation, and a consensus prediction scheme is tested that combines orthogonal machine learning algorithms into a single predictor. Enrichments ranging from 15 to 101 for a TPR cutoff of 25% are observed.

[1]  D. Hecht,et al.  High-Throughput Ligand Screening via Preclustering and Evolved Neural Networks , 2007, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[2]  Dug Hun Hong,et al.  Support vector fuzzy regression machines , 2003, Fuzzy Sets Syst..

[3]  Jens Meiler,et al.  Automated Structure Elucidation of Organic Molecules from 13C NMR Spectra Using Genetic Algorithms and Neural Networks. , 2002 .

[4]  T Scior,et al.  How to recognize and workaround pitfalls in QSAR studies: a critical review. , 2009, Current medicinal chemistry.

[5]  Darrell R. Abernethy,et al.  International Union of Pharmacology: Approaches to the Nomenclature of Voltage-Gated Ion Channels , 2003, Pharmacological Reviews.

[6]  Julio Caballero,et al.  2D Autocorrelation modeling of the negative inotropic activity of calcium entry blockers using Bayesian-regularized genetic neural networks. , 2006, Bioorganic & medicinal chemistry.

[7]  Min Li,et al.  Chronic Inhibition of Cardiac Kir2.1 and hERG Potassium Channels by Celastrol with Dual Effects on Both Ion Conductivity and Protein Trafficking*♦ , 2006, Journal of Biological Chemistry.

[8]  Alexander J. Smola,et al.  Learning with kernels , 1998 .

[9]  Maykel Pérez González,et al.  Radial distribution function descriptors: an alternative for predicting A2 A adenosine receptors agonists. , 2006, European journal of medicinal chemistry.

[10]  Andrew P. Bradley,et al.  The use of the area under the ROC curve in the evaluation of machine learning algorithms , 1997, Pattern Recognit..

[11]  David A Winkler,et al.  Neural networks as robust tools in drug lead discovery and development , 2004, Molecular biotechnology.

[12]  K. Shadan,et al.  Available online: , 2012 .

[13]  Michael K. Gilson,et al.  Fast Assignment of Accurate Partial Atomic Charges: An Electronegativity Equalization Method that Accounts for Alternate Resonance Forms , 2003, J. Chem. Inf. Comput. Sci..

[14]  Jeffrey S Handen,et al.  The industrialization of drug discovery. , 2002, Drug discovery today.

[15]  Bernhard Schölkopf,et al.  Learning with kernels , 2001 .

[16]  M. Nelson,et al.  The role of T-type calcium channels in epilepsy and pain. , 2006, Current pharmaceutical design.

[17]  Y. Pommier,et al.  Identification of phosphotyrosine mimetic inhibitors of human tyrosyl-DNA phosphodiesterase I by a novel AlphaScreen high-throughput assay , 2009, Molecular Cancer Therapeutics.

[18]  W W Offen,et al.  The selective muscarinic agonist xanomeline improves both the cognitive deficits and behavioral symptoms of Alzheimer disease. , 1997, Alzheimer disease and associated disorders.

[19]  Tomasz Arodz,et al.  Computational methods in developing quantitative structure-activity relationships (QSAR): a review. , 2006, Combinatorial chemistry & high throughput screening.

[20]  E. Perez-Reyes,et al.  The Endogenous Redox Agent L-Cysteine Induces T-Type Ca2+ Channel-Dependent Sensitization of a Novel Subpopulation of Rat Peripheral Nociceptors , 2005, The Journal of Neuroscience.

[21]  Teuvo Kohonen,et al.  Self-Organization and Associative Memory , 1988 .

[22]  R. Hecht-Nielsen Counterpropagation networks. , 1987, Applied optics.

[23]  K.Z. Mao,et al.  Orthogonal forward selection and backward elimination algorithms for feature subset selection , 2004, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[24]  R. M. Muir,et al.  Correlation of Biological Activity of Phenoxyacetic Acids with Hammett Substituent Constants and Partition Coefficients , 1962, Nature.

[25]  Alexander J. Smola,et al.  Support Vector Regression Machines , 1996, NIPS.

[26]  Steven V. Fox,et al.  Orexin receptor antagonism prevents transcriptional and behavioral plasticity resulting from stimulant exposure , 2010, Neuropharmacology.

[27]  Jens Meiler,et al.  Automated Structure Elucidation of Organic Molecules from 13C NMR Spectra Using Genetic Algorithms and Neural Networks , 2001, J. Chem. Inf. Comput. Sci..

[28]  Y Xue,et al.  Prediction of torsade-causing potential of drugs by support vector machine approach. , 2004, Toxicological sciences : an official journal of the Society of Toxicology.

[29]  A L Goldberger,et al.  Effects of central muscarinic-1 receptor stimulation on blood pressure regulation. , 1997, Hypertension.

[30]  W. Tong,et al.  Quantitative structure‐activity relationship methods: Perspectives on drug discovery and toxicology , 2003, Environmental toxicology and chemistry.

[31]  Neville A. McBrien,et al.  Muscarinic antagonist control of myopia: evidence for M4 and M1 receptor-based pathways in the inhibition of experimentally-induced axial myopia in the tree shrew. , 2012, Investigative ophthalmology & visual science.

[32]  Henry A. Lester,et al.  International Union of Pharmacology. XLI. Compendium of voltage-gated ion channels: potassium channels. , 2003, Pharmacological reviews.

[33]  A. H. Lipkus A proof of the triangle inequality for the Tanimoto distance , 1999 .

[34]  Robin J. Leach,et al.  A pore mutation in a novel KQT-like potassium channel gene in an idiopathic epilepsy family , 1998, Nature Genetics.

[35]  Hanna Geppert,et al.  Current Trends in Ligand-Based Virtual Screening: Molecular Representations, Data Mining Methods, New Application Areas, and Performance Evaluation , 2010, J. Chem. Inf. Model..

[36]  Li Shao,et al.  Consensus Ranking Approach to Understanding the Underlying Mechanism With QSAR , 2010, J. Chem. Inf. Model..

[37]  Jens Meiler,et al.  Epothilones: Quantitative Structure Activity Relations Studied by Support Vector Machines and Artificial Neural Networks , 2003 .

[38]  Teuvo Kohonen,et al.  Self-organized formation of topologically correct feature maps , 2004, Biological Cybernetics.

[39]  Michael K. Gilson,et al.  Virtual Screening of Molecular Databases Using a Support Vector Machine , 2005, J. Chem. Inf. Model..

[40]  J. Kent Information gain and a general measure of correlation , 1983 .

[41]  Julio Caballero,et al.  Structural requirements of pyrido[2,3-d]pyrimidin-7-one as CDK4/D inhibitors: 2D autocorrelation, CoMFA and CoMSIA analyses. , 2008, Bioorganic & medicinal chemistry.

[42]  Victor O. Sadras,et al.  Use of Lorenz curves and Gini coefficients to assess yield inequality within paddocks , 2004 .

[43]  Wei Wu,et al.  Differential effects of m1 and m2 receptor antagonists in perirhinal cortex on visual recognition memory in monkeys , 2012, Neurobiology of Learning and Memory.

[44]  Gary Aston-Jones,et al.  Orexin/hypocretin signaling at the orexin 1 receptor regulates cue‐elicited cocaine‐seeking , 2009, The European journal of neuroscience.

[45]  Jens Meiler,et al.  Application of machine learning approaches on quantitative structure activity relationships , 2009, CIBCB.

[46]  Ruili Huang,et al.  Structure Based Model for the Prediction of Phospholipidosis Induction Potential of Small Molecules , 2012, J. Chem. Inf. Model..

[47]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[48]  Jens Meiler,et al.  Generation and evaluation of dimension-reduced amino acid parameter representations by artificial neural networks , 2001 .

[49]  Maykel Pérez González,et al.  Radial Distribution Function descriptors for predicting affinity for vitamin D receptor. , 2008, European journal of medicinal chemistry.

[50]  T. Bonner,et al.  Identification and Characterization of the Rat M1 Muscarinic Receptor Promoter , 1999, Journal of neurochemistry.

[51]  K. Chou,et al.  Recent advances in QSAR and their applications in predicting the activities of chemical molecules, peptides and proteins for drug design. , 2008, Current protein & peptide science.

[52]  Ralph W. Fawcett Ph. D. A radial distribution function analysis of an amorphous calcium phosphate with calcium to phosphorus molar ratio of 1.42 , 2005, Calcified Tissue Research.

[53]  Boris Hollas,et al.  An Analysis of the Autocorrelation Descriptor for Molecules , 2003 .

[54]  L. Makings,et al.  A FRET-based assay platform for ultra-high density drug screening of protein kinases and phosphatases. , 2002, Assay and drug development technologies.

[55]  James Bailey,et al.  ROC-tree: A Novel Decision Tree Induction Algorithm Based on Receiver Operating Characteristics to Classify Gene Expression Data , 2008, SDM.

[56]  Alexander Tropsha,et al.  Best Practices for QSAR Model Development, Validation, and Exploitation , 2010, Molecular informatics.

[57]  Jürgen Bajorath,et al.  Integration of virtual and high-throughput screening , 2002, Nature Reviews Drug Discovery.

[58]  Yves Pommier,et al.  Inhibition of Human Tyrosyl-DNA Phosphodiesterase by Aminoglycoside Antibiotics and Ribosome Inhibitors , 2006, Molecular Pharmacology.

[59]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[60]  Chih-Jen Lin,et al.  Combining SVMs with Various Feature Selection Strategies , 2006, Feature Extraction.

[61]  Lalit M. Patnaik,et al.  Target detection through image processing and resilient propagation algorithms , 2000, Neurocomputing.

[62]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[63]  S R Nahorski,et al.  Muscarinic m1 receptor-stimulated adenylate cyclase activity in Chinese hamster ovary cells is mediated by Gs alpha and is not a consequence of phosphoinositidase C activation. , 1996, The Biochemical journal.

[64]  Randy D Blakely,et al.  Na+, Cl−, and pH Dependence of the Human Choline Transporter (hCHT) in Xenopus Oocytes: The Proton Inactivation Hypothesis of hCHT in Synaptic Vesicles , 2006, The Journal of Neuroscience.

[65]  Teuvo Kohonen,et al.  The self-organizing map , 1990, Neurocomputing.

[66]  J. Gasteiger,et al.  Automatic generation of 3D-atomic coordinates for organic molecules , 1990 .

[67]  J. Gasteiger,et al.  The comparison of molecular surfaces by neural networks and its applications to quantitative structure activity studies , 1998 .

[68]  Berith F. Jensen,et al.  In silico prediction of cytochrome P450 2D6 and 3A4 inhibition using Gaussian kernel weighted k-nearest neighbor and extended connectivity fingerprints, including structural fragment analysis of inhibitors versus noninhibitors. , 2007, Journal of medicinal chemistry.

[69]  O. McManus,et al.  Ion Channels as Drug Targets: The Next GPCRs , 2008, The Journal of general physiology.

[70]  T. Insel,et al.  NIH Molecular Libraries Initiative , 2004, Science.

[71]  Judith C. Madden,et al.  Consensus QSAR Models: Do the Benefits Outweigh the Complexity? , 2007, J. Chem. Inf. Model..

[72]  Yves Pommier,et al.  Novel high-throughput electrochemiluminescent assay for identification of human tyrosyl-DNA phosphodiesterase (Tdp1) inhibitors and characterization of furamidine (NSC 305831) as an inhibitor of Tdp1 , 2007, Nucleic acids research.

[73]  Jonathan D. Hirst,et al.  New approaches to QSAR: Neural networks and machine learning , 1993 .

[74]  Robert P. Sheridan,et al.  Random Forest: A Classification and Regression Tool for Compound Classification and QSAR Modeling , 2003, J. Chem. Inf. Comput. Sci..

[75]  John Kinney,et al.  Practical Outcomes of Applying Ensemble Machine Learning Classifiers to High-Throughput Screening (HTS) Data Analysis and Screening , 2008, J. Chem. Inf. Model..

[76]  Randy D Blakely,et al.  The choline transporter resurfaces: new roles for synaptic vesicles? , 2004, Molecular interventions.

[77]  E. Perez-Reyes Molecular physiology of low-voltage-activated t-type calcium channels. , 2003, Physiological reviews.

[78]  Gisbert Schneider,et al.  Computer-based de novo design of drug-like molecules , 2005, Nature Reviews Drug Discovery.

[79]  C. Hansch,et al.  Use of quantitative structure-activity relationships (QSAR) in drug design (review) , 1980, Pharmaceutical Chemistry Journal.

[80]  Bernhard Schölkopf,et al.  A tutorial on support vector regression , 2004, Stat. Comput..

[81]  Jens Meiler,et al.  Identification of Metabotropic Glutamate Receptor Subtype 5 Potentiators Using Virtual High-Throughput Screening , 2010, ACS chemical neuroscience.

[82]  Allan P. White,et al.  Technical Note: Bias in Information-Based Measures in Decision Tree Induction , 1994, Machine Learning.

[83]  Gerhard Klebe,et al.  Use of 3D QSAR Models for Database Screening: A Feasibility Study , 2008, J. Chem. Inf. Model..

[84]  Justus M.B. Anumonwo,et al.  Unique Kir2.x Properties Determine Regional and Species Differences in the Cardiac Inward Rectifier K+ Current , 2004, Circulation research.

[85]  Igor V. Pletnev,et al.  Drug Discovery Using Support Vector Machines. The Case Studies of Drug-likeness, Agrochemical-likeness, and Enzyme Inhibition Predictions , 2003, J. Chem. Inf. Comput. Sci..

[86]  R. L. Robinson,et al.  Virtual Design of Chemical Penetration Enhancers for Transdermal Drug Delivery , 2012, Chemical biology & drug design.

[87]  Yves Pommier,et al.  Tyrosyl-DNA phosphodiesterase as a target for anticancer therapy. , 2008, Anti-cancer agents in medicinal chemistry.

[88]  Vladimir Vapnik,et al.  The Nature of Statistical Learning , 1995 .