Most Ligand-Based Benchmarks Measure Overfitting Rather than Accuracy

Undetected overfitting can occur when there are significant redundancies between training and validation data. We describe AVE, a new measure of training-validation redundancy for ligand-based classification problems, that accounts for the similarity among inactive molecules as well as active ones. We investigated seven widely used benchmarks for virtual screening and classification, and we show that the amount of AVE bias strongly correlates with the performance of ligand-based predictive methods irrespective of the predicted property, chemical fingerprint, similarity measure, or previously applied unbiasing techniques. Therefore, it may be the case that the previously reported performance of most ligand-based methods can be explained by overfitting to benchmarks rather than good prospective accuracy.

[1]  Zhenming Liu,et al.  An Unbiased Method To Build Benchmarking Sets for Ligand-Based Virtual Screening and its Application To GPCRs , 2014, J. Chem. Inf. Model..

[2]  Paul Watson,et al.  Virtual Screening Using Protein-Ligand Docking: Avoiding Artificial Enrichment , 2004, J. Chem. Inf. Model..

[3]  Peer Bork,et al.  The SIDER database of drugs and side effects , 2015, Nucleic Acids Res..

[4]  Vijay S. Pande,et al.  Molecular graph convolutions: moving beyond fingerprints , 2016, Journal of Computer-Aided Molecular Design.

[5]  Fionn Murtagh,et al.  Algorithms for hierarchical clustering: an overview , 2012, WIREs Data Mining Knowl. Discov..

[6]  R. Venkataraghavan,et al.  Atom pairs as molecular features in structure-activity studies: definition and applications , 1985, J. Chem. Inf. Comput. Sci..

[7]  Jürgen Bajorath,et al.  MMP-Cliffs: Systematic Identification of Activity Cliffs on the Basis of Matched Molecular Pairs , 2012, J. Chem. Inf. Model..

[8]  Eric Jones,et al.  SciPy: Open Source Scientific Tools for Python , 2001 .

[9]  Ajay N. Jain Bias, reporting, and sharing: computational evaluations of docking methods , 2008, J. Comput. Aided Mol. Des..

[10]  Hiroaki Wakabayashi,et al.  Predicting Key Example Compounds in Competitors' Patent Applications Using Structural Information Alone , 2008, J. Chem. Inf. Model..

[11]  Robert P. Sheridan,et al.  Deep Neural Nets as a Method for Quantitative Structure-Activity Relationships , 2015, J. Chem. Inf. Model..

[12]  Ajay N. Jain,et al.  Effects of inductive bias on computational evaluations of ligand-based modeling and on drug discovery , 2008, J. Comput. Aided Mol. Des..

[13]  Evan Bolton,et al.  PubChem's BioAssay Database , 2011, Nucleic Acids Res..

[14]  Robert P. Sheridan,et al.  Chemical Similarity Using Physiochemical Property Descriptors , 1996, J. Chem. Inf. Comput. Sci..

[15]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[16]  Heike Schönherr,et al.  Profound methyl effects in drug discovery and a call for new C-H methylation reactions. , 2013, Angewandte Chemie.

[17]  Michael M. Mysinger,et al.  Directory of Useful Decoys, Enhanced (DUD-E): Better Ligands and Decoys for Better Benchmarking , 2012, Journal of medicinal chemistry.

[18]  James G. Nourse,et al.  Reoptimization of MDL Keys for Use in Drug Discovery , 2002, J. Chem. Inf. Comput. Sci..

[19]  Roger A. Sayle,et al.  Electrostatic evaluation of isosteric analogues , 2006, J. Comput. Aided Mol. Des..

[20]  Anne Mai Wassermann,et al.  REPROVIS-DB: A Benchmark System for Ligand-Based Virtual Screening Derived from Reproducible Prospective Applications , 2011, J. Chem. Inf. Model..

[21]  George Papadatos,et al.  The ChEMBL bioactivity database: an update , 2013, Nucleic Acids Res..

[22]  Tudor I. Oprea,et al.  Optimization of CAMD techniques 3. Virtual screening enrichment studies: a help or hindrance in tool selection? , 2008, J. Comput. Aided Mol. Des..

[23]  Yann LeCun,et al.  The mnist database of handwritten digits , 2005 .

[24]  Jean-Louis Reymond,et al.  Enumeration of 166 Billion Organic Small Molecules in the Chemical Universe Database GDB-17 , 2012, J. Chem. Inf. Model..

[25]  David Rogers,et al.  Extended-Connectivity Fingerprints , 2010, J. Chem. Inf. Model..

[26]  Steven K. Gibb Toxicity testing in the 21st century: a vision and a strategy. , 2008, Reproductive toxicology.

[27]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[28]  Sebastian G. Rohrer,et al.  Maximum Unbiased Validation (MUV) Data Sets for Virtual Screening Based on PubChem Bioactivity Data , 2009, J. Chem. Inf. Model..

[29]  Steven J Brown,et al.  A Global Map of Lipid-Binding Proteins and Their Ligandability in Cells , 2015, Cell.

[30]  Paul Covington,et al.  Deep Neural Networks for YouTube Recommendations , 2016, RecSys.

[31]  A. Vulpetti,et al.  The experimental uncertainty of heterogeneous public K(i) data. , 2012, Journal of medicinal chemistry.

[32]  Vijay S. Pande,et al.  Massively Multitask Networks for Drug Discovery , 2015, ArXiv.

[33]  Vijay S. Pande,et al.  MoleculeNet: a benchmark for molecular machine learning , 2017, Chemical science.

[34]  A. Vulpetti,et al.  Comparability of Mixed IC50 Data – A Statistical Analysis , 2013, PloS one.

[35]  Eric J. Martin,et al.  Profile-QSAR 2.0: Kinase Virtual Screening Accuracy Comparable to Four-Concentration IC50s for Realistically Novel Compounds , 2017, J. Chem. Inf. Model..

[36]  Jürgen Bajorath,et al.  Systematic Identification and Classification of Three-Dimensional Activity Cliffs , 2012, J. Chem. Inf. Model..

[37]  Brian Goldman,et al.  Modeling Industrial ADMET Data with Multitask Networks , 2016, 1606.08793.

[38]  Michael J. Keiser,et al.  Large Scale Prediction and Testing of Drug Activity on Side-Effect Targets , 2012, Nature.

[39]  Ajay N. Jain,et al.  Does your model weigh the same as a Duck? , 2011, Journal of Computer-Aided Molecular Design.

[40]  G. Bemis,et al.  The properties of known drugs. 1. Molecular frameworks. , 1996, Journal of medicinal chemistry.

[41]  Ajay N. Jain,et al.  Chemical structural novelty: on-targets and off-targets. , 2011, Journal of medicinal chemistry.

[42]  Andreas Mayr,et al.  Deep Learning as an Opportunity in Virtual Screening , 2015 .