Boosting Virtual Screening Enrichments with Data Fusion: Coalescing Hits from Two-Dimensional Fingerprints, Shape, and Docking

Virtual screening is an effective way to find hits in drug discovery, with approaches ranging from fast information-based similarity methods to more computationally intensive physics-based docking methods. However, the best approach to use for a given project is not clear in advance of the screen. In this work, we show that combining results from multiple methods using a standard score (Z-score) can significantly improve virtual screening enrichments over any of the single screening methods. We show that an augmented Z-score, which considers the best two out of three scores for a given compound, outperforms previously published data fusion algorithms. We use three different virtual screening methods (two-dimensional (2D) fingerprint similarity, shape-based similarity, and docking) and study two different databases (DUD and MDDR). The average enrichment in the top 1% was improved by 9% for DUD and 25% for the MDDR, compared with the top individual method. Improvements of 22% for DUD and 43% for MDDR are seen over the average of the three individual methods. Statistics are presented that show a high significance associated with the findings in this work.

[1]  Y. Kurogi,et al.  Pharmacophore modeling and three-dimensional database searching for drug design using catalyst. , 2001, Current medicinal chemistry.

[2]  W. L. Jorgensen,et al.  The OPLS [optimized potentials for liquid simulations] potential functions for proteins, energy minimizations for crystals of cyclic peptides and crambin. , 1988, Journal of the American Chemical Society.

[3]  Qiang Zhang,et al.  Scaffold hopping through virtual screening using 2D and 3D similarity descriptors: ranking, voting, and consensus scoring. , 2006, Journal of medicinal chemistry.

[4]  Yvonne C. Martin,et al.  The Information Content of 2D and 3D Structural Descriptors Relevant to Ligand-Receptor Binding , 1997, J. Chem. Inf. Comput. Sci..

[5]  R. Glen,et al.  Molecular recognition of receptor sites using a genetic algorithm with a description of desolvation. , 1995, Journal of molecular biology.

[6]  Woody Sherman,et al.  Exploring protein flexibility: incorporating structural ensembles from crystal structures and simulation into virtual screening protocols. , 2012, The journal of physical chemistry. B.

[7]  D. Frank Hsu,et al.  Consensus Scoring Criteria for Improving Enrichment in Virtual Screening , 2005, J. Chem. Inf. Model..

[8]  Peter Willett,et al.  Combination of Similarity Rankings Using Data Fusion , 2013, J. Chem. Inf. Model..

[9]  B. Shoichet,et al.  Molecular docking and high-throughput screening for novel inhibitors of protein tyrosine phosphatase-1B. , 2002, Journal of medicinal chemistry.

[10]  Yvonne C. Martin,et al.  Application of Belief Theory to Similarity Data Fusion for Use in Analog Searching and Lead Hopping , 2008, J. Chem. Inf. Model..

[11]  Fredrik Svensson,et al.  Virtual Screening Data Fusion Using Both Structure- and Ligand-Based Methods , 2012, J. Chem. Inf. Model..

[12]  Woody Sherman,et al.  Structure-Based Virtual Screening of MT2 Melatonin Receptor: Influence of Template Choice and Structural Refinement , 2013, J. Chem. Inf. Model..

[13]  Cornel Catana,et al.  Inhibition of protein–protein interactions: The discovery of druglike β‐catenin inhibitors by combining virtual and biophysical screening , 2006, Proteins.

[14]  Naomie Salim,et al.  Combination of Fingerprint-Based Similarity Coefficients Using Data Fusion , 2003, J. Chem. Inf. Comput. Sci..

[15]  Robert P. Sheridan,et al.  Comparison of Topological, Shape, and Docking Methods in Virtual Screening , 2007, J. Chem. Inf. Model..

[16]  Jeremy R. Greenwood,et al.  Epik: a software program for pKa prediction and protonation state generation for drug-like molecules , 2007, J. Comput. Aided Mol. Des..

[17]  Simona Distinto,et al.  How To Optimize Shape-Based Virtual Screening: Choosing the Right Query and Including Chemical Information , 2009, J. Chem. Inf. Model..

[18]  Stuart L. Schreiber,et al.  Query Chem: a Google-powered web search combining text and chemical structures , 2006, Bioinform..

[19]  Mark McGann,et al.  FRED Pose Prediction and Virtual Screening Accuracy , 2011, J. Chem. Inf. Model..

[20]  Xueliang Fang,et al.  Discovery of a nanomolar inhibitor of the human murine double minute 2 (MDM2)-p53 interaction through an integrated, virtual database screening strategy. , 2006, Journal of medicinal chemistry.

[21]  M. Mizutani,et al.  Efficient method for high-throughput virtual screening based on flexible docking: discovery of novel acetylcholinesterase inhibitors. , 2004, Journal of medicinal chemistry.

[22]  Woody Sherman,et al.  ConfGen: A Conformational Search Method for Efficient Generation of Bioactive Conformers , 2010, J. Chem. Inf. Model..

[23]  J. A. Grant,et al.  A shape-based 3-D scaffold hopping method and its application to a bacterial protein-protein interaction. , 2005, Journal of medicinal chemistry.

[24]  Anthony Nicholls,et al.  What do we know and when do we know it? , 2008, J. Comput. Aided Mol. Des..

[25]  A. Pierce,et al.  Docking study yields four novel inhibitors of the protooncogene Pim-1 kinase. , 2008, Journal of medicinal chemistry.

[26]  C. John Blankley,et al.  Comparison of 2D Fingerprint Types and Hierarchy Level Selection Methods for Structural Grouping Using Ward's Clustering , 2000, J. Chem. Inf. Comput. Sci..

[27]  P Willett,et al.  Development and validation of a genetic algorithm for flexible docking. , 1997, Journal of molecular biology.

[28]  Reiji Teramoto,et al.  Supervised Consensus Scoring for Docking and Virtual Screening , 2007, J. Chem. Inf. Model..

[29]  Andreas Bender,et al.  Recognizing Pitfalls in Virtual Screening: A Critical Review , 2012, J. Chem. Inf. Model..

[30]  David E. Shaw,et al.  PHASE: a new engine for pharmacophore perception, 3D QSAR model development, and 3D database screening: 1. Methodology and preliminary results , 2006, J. Comput. Aided Mol. Des..

[31]  Hege S. Beard,et al.  Glide: a new approach for rapid, accurate docking and scoring. 2. Enrichment factors in database screening. , 2004, Journal of medicinal chemistry.

[32]  Chris Williams,et al.  Reverse fingerprinting, similarity searching by group fusion and fingerprint bit importance , 2006, Molecular Diversity.

[33]  R. Knegtel,et al.  A Role for Hydration in Interleukin‐2 Inducible T Cell Kinase (Itk) Selectivity , 2011, Molecular informatics.

[34]  Jérôme Hert,et al.  New Methods for Ligand-Based Virtual Screening: Use of Data Fusion and Machine Learning to Enhance the Effectiveness of Similarity Searching , 2006, J. Chem. Inf. Model..

[35]  M. Murcko,et al.  Consensus scoring: A method for obtaining improved hit rates from docking databases of three-dimensional structures into proteins. , 1999, Journal of medicinal chemistry.

[36]  J. Irwin,et al.  Benchmarking sets for molecular docking. , 2006, Journal of medicinal chemistry.

[37]  W. Sherman,et al.  Prediction of Absolute Solvation Free Energies using Molecular Dynamics Free Energy Perturbation and the OPLS Force Field. , 2010, Journal of chemical theory and computation.

[38]  Peter Willett,et al.  Promoting Access to White Rose Research Papers Enhancing the Effectiveness of Ligand-based Virtual Screening Using Data Fusion , 2022 .

[39]  David Vidal,et al.  LINGO, an Efficient Holographic Text Based Method To Calculate Biophysical Properties and Intermolecular Similarities , 2005, J. Chem. Inf. Model..

[40]  Shuichi Hirono,et al.  Comparison of Consensus Scoring Strategies for Evaluating Computational Models of Protein-Ligand Complexes , 2006, J. Chem. Inf. Model..

[41]  Woody Sherman,et al.  Analysis and comparison of 2D fingerprints: insights into database screening performance using eight fingerprint methods , 2010, J. Cheminformatics.

[42]  Miklos Feher,et al.  Consensus scoring for protein-ligand interactions. , 2006, Drug discovery today.

[43]  Paul W Finn,et al.  Ultrafast shape recognition: evaluating a new ligand-based virtual screening technology. , 2009, Journal of molecular graphics & modelling.

[44]  Valerie J. Gillet,et al.  Analysis of Data Fusion Methods in Virtual Screening: Theoretical Model , 2006, J. Chem. Inf. Model..

[45]  Jürgen Bajorath,et al.  New methodologies for ligand-based virtual screening. , 2005, Current pharmaceutical design.

[46]  R. Clark,et al.  Consensus scoring for ligand/protein interactions. , 2002, Journal of molecular graphics & modelling.

[47]  Anthony E. Klon,et al.  Combination of a naive Bayes classifier with consensus scoring improves enrichment of high-throughput docking results. , 2004, Journal of medicinal chemistry.

[48]  P. Willett,et al.  Implementation of nonhierarchic cluster analysis methods in chemical information structure search , 1986 .

[49]  D S Goodsell,et al.  Automated docking of flexible ligands: Applications of autodock , 1996, Journal of molecular recognition : JMR.

[50]  Stefano Costanzi,et al.  Discovery of novel agonists and antagonists of the free fatty acid receptor 1 (FFAR1) using virtual screening. , 2008, Journal of medicinal chemistry.

[51]  Antti Poso,et al.  An in silico approach to discovering novel inhibitors of human sirtuin type 2. , 2004, Journal of medicinal chemistry.

[52]  Woody Sherman,et al.  Large-Scale Systematic Analysis of 2D Fingerprint Methods and Parameters to Improve Virtual Screening Enrichments , 2010, J. Chem. Inf. Model..

[53]  K. Jacobson,et al.  P2Y1 antagonists: combining receptor-based modeling and QSAR for a quantitative prediction of the biological activity based on consensus scoring. , 2007, Journal of medicinal chemistry.

[54]  Woody Sherman,et al.  Protein and ligand preparation: parameters, protocols, and influence on virtual screening enrichments , 2013, Journal of Computer-Aided Molecular Design.

[55]  Matthew P. Repasky,et al.  Glide: a new approach for rapid, accurate docking and scoring. 1. Method and assessment of docking accuracy. , 2004, Journal of medicinal chemistry.

[56]  Richard A. Friesner,et al.  Integrated Modeling Program, Applied Chemical Theory (IMPACT) , 2005, J. Comput. Chem..

[57]  Woody Sherman,et al.  Rapid Shape-Based Ligand Alignment and Virtual Screening Method Based on Atom/Feature-Pair Similarities and Volume Overlap Scoring , 2011, J. Chem. Inf. Model..

[58]  V. Luzhkov,et al.  Virtual screening and bioassay study of novel inhibitors for dengue virus mRNA cap (nucleoside-2'O)-methyltransferase. , 2007, Bioorganic & medicinal chemistry.

[59]  Ajay N. Jain Surflex: fully automatic flexible molecular docking using a molecular similarity-based search engine. , 2003, Journal of medicinal chemistry.

[60]  Matthew P. Repasky,et al.  Extra precision glide: docking and scoring incorporating a model of hydrophobic enclosure for protein-ligand complexes. , 2006, Journal of medicinal chemistry.

[61]  R. Friesner,et al.  Evaluation and Reparametrization of the OPLS-AA Force Field for Proteins via Comparison with Accurate Quantum Chemical Calculations on Peptides† , 2001 .

[62]  Jürgen Bajorath,et al.  Comparison of 2D Fingerprint Methods for Multiple‐Template Similarity Searching on Compound Activity Classes of Increasing Structural Diversity , 2007, ChemMedChem.

[63]  Richard A. Friesner,et al.  Docking performance of the glide program as evaluated on the Astex and DUD datasets: a complete set of glide SP results and selected results for a new scoring function integrating WaterMap and glide , 2012, Journal of Computer-Aided Molecular Design.

[64]  Gerhard Klebe,et al.  Successful virtual screening for novel inhibitors of human carbonic anhydrase: strategy and experimental confirmation. , 2002, Journal of medicinal chemistry.

[65]  Christopher I. Bayly,et al.  Evaluating Virtual Screening Methods: Good and Bad Metrics for the "Early Recognition" Problem , 2007, J. Chem. Inf. Model..

[66]  Woody Sherman,et al.  Generation of Receptor Structural Ensembles for Virtual Screening Using Binding Site Shape Analysis and Clustering , 2012, Chemical biology & drug design.

[67]  R Abagyan,et al.  Flexible protein–ligand docking by global energy optimization in internal coordinates , 1997, Proteins.

[68]  Woody Sherman,et al.  Use of an Induced Fit Receptor Structure in Virtual Screening , 2006, Chemical biology & drug design.

[69]  Shaomeng Wang,et al.  How Does Consensus Scoring Work for Virtual Library Screening? An Idealized Computer Experiment , 2001, J. Chem. Inf. Comput. Sci..

[70]  Miklos Feher,et al.  The Use of Consensus Scoring in Ligand-Based Virtual Screening , 2006, J. Chem. Inf. Model..

[71]  Robert P. Sheridan Finding Multiactivity Substructures by Mining Databases of Drug-Like Compounds , 2003, J. Chem. Inf. Comput. Sci..

[72]  R. Friesner,et al.  Novel procedure for modeling ligand/receptor induced fit effects. , 2006, Journal of medicinal chemistry.