High-Quality Dataset of Protein-Bound Ligand Conformations and Its Application to Benchmarking Conformer Ensemble Generators

We developed a cheminformatics pipeline for the fully automated selection and extraction of high-quality protein-bound ligand conformations from X-ray structural data. The pipeline evaluates the validity and accuracy of the 3D structures of small molecules according to multiple criteria, including their fit to the electron density and their physicochemical and structural properties. Using this approach, we compiled two high-quality datasets from the Protein Data Bank (PDB): a comprehensive dataset and a diversified subset of 4626 and 2912 structures, respectively. The datasets were applied to benchmarking seven freely available conformer ensemble generators: Balloon (two different algorithms), the RDKit standard conformer ensemble generator, the Experimental-Torsion basic Knowledge Distance Geometry (ETKDG) algorithm, Confab, Frog2 and Multiconf-DOCK. Substantial differences in the performance of the individual algorithms were observed, with RDKit and ETKDG generally achieving a favorable balance of accuracy, ensemble size and runtime. The Platinum datasets are available for download from http://www.zbh.uni-hamburg.de/platinum_dataset .

[1]  Matthias Rarey,et al.  NAOMI: On the Almost Trivial Task of Reading Molecules from Different File formats , 2011, J. Chem. Inf. Model..

[2]  G. Murshudov,et al.  Refinement of macromolecular structures by the maximum-likelihood method. , 1997, Acta crystallographica. Section D, Biological crystallography.

[3]  Thierry Langer,et al.  Comparative Analysis of Protein-Bound Ligand Conformations with Respect to Catalyst's Conformational Space Subsampling Algorithms , 2005, J. Chem. Inf. Model..

[4]  Chris Morley,et al.  Open Babel: An open chemical toolbox , 2011, J. Cheminformatics.

[5]  Mark S. Johnson,et al.  Generating Conformer Ensembles Using a Multiobjective Genetic Algorithm , 2007, J. Chem. Inf. Model..

[6]  Pierre Tufféry,et al.  Frog2: Efficient 3D conformation ensemble generator for small compounds , 2010, Nucleic Acids Res..

[7]  Charlotte M. Deane,et al.  Freely Available Conformer Generation Methods: How Good Are They? , 2012, J. Chem. Inf. Model..

[8]  Jonas Boström,et al.  Assessing the performance of OMEGA with respect to retrieving bioactive conformations. , 2003, Journal of molecular graphics & modelling.

[9]  Paul N. Mortenson,et al.  Diverse, high-quality test set for the validation of protein-ligand docking performance. , 2007, Journal of medicinal chemistry.

[10]  Matthias Rarey,et al.  Protoss: a holistic approach to predict tautomers and protonation states in protein-ligand complexes , 2014, Journal of Cheminformatics.

[11]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[12]  T. A. Jones,et al.  The Uppsala Electron-Density Server. , 2004, Acta crystallographica. Section D, Biological crystallography.

[13]  Darko Butina,et al.  Unsupervised Data Base Clustering Based on Daylight's Fingerprint and Tanimoto Similarity: A Fast and Automated Way To Cluster Small and Large Data Sets , 1999, J. Chem. Inf. Comput. Sci..

[14]  J. Zou,et al.  Improved methods for building protein models in electron density maps and the location of errors in these models. , 1991, Acta crystallographica. Section A, Foundations of crystallography.

[15]  I. Bruno,et al.  Cambridge Structural Database , 2002 .

[16]  Maria A Miteva,et al.  DG-AMMOS: A New tool to generate 3D conformation of small molecules using Distance Geometry and Automated Molecular Mechanics Optimization for in silico Screening , 2009, BMC chemical biology.

[17]  Paolo Tosco,et al.  Bringing the MMFF force field to the RDKit: implementation and validation , 2014, Journal of Cheminformatics.

[18]  Matthias Rarey,et al.  Evidence of Water Molecules - A Statistical Evaluation of Water Molecules Based on Electron Density , 2015, J. Chem. Inf. Model..

[19]  P. Charifson,et al.  Conformational analysis of drug-like molecules bound to proteins: an extensive study of ligand reorganization upon binding. , 2004, Journal of medicinal chemistry.

[20]  Irwin D. Kuntz,et al.  Development and validation of a modular, extensible docking program: DOCK 5 , 2006, J. Comput. Aided Mol. Des..

[21]  Matthias Rarey,et al.  CONFECT: Conformations from an Expert Collection of Torsion Patterns , 2013, ChemMedChem.

[22]  David S. Wishart,et al.  DrugBank: a comprehensive resource for in silico drug discovery and exploration , 2005, Nucleic Acids Res..

[23]  Robin Taylor,et al.  A new test set for validating predictions of protein–ligand interaction , 2002, Proteins.

[24]  D. Cruickshank,et al.  Remarks about protein structure precision. , 1999, Acta crystallographica. Section D, Biological crystallography.

[25]  David Lagorce,et al.  MS-DOCK: Accurate multiple conformation generator and rigid docking protocol for multi-step virtual ligand screening , 2008, BMC Bioinformatics.

[26]  Andrew Smellie,et al.  Analysis of Conformational Coverage, 1. Validation and Estimation of Coverage , 1995, J. Chem. Inf. Comput. Sci..

[27]  P. Kollman,et al.  A Second Generation Force Field for the Simulation of Proteins, Nucleic Acids, and Organic Molecules , 1995 .

[28]  Sereina Riniker,et al.  Better Informed Distance Geometry: Using What We Know To Improve Conformation Generation , 2015, J. Chem. Inf. Model..

[29]  Jens Meiler,et al.  BCL::Conf: small molecule conformational sampling using a knowledge based rotamer library , 2015, Journal of Cheminformatics.

[30]  Tania Pencheva,et al.  BMC Bioinformatics BioMed Central Methodology article AMMOS: Automated Molecular Mechanics Optimization tool for in silico Screening , 2022 .

[31]  Noriaki Hirayama,et al.  Ph4Dock: pharmacophore-based protein-ligand docking. , 2004, Journal of medicinal chemistry.

[32]  Colin McMartin,et al.  QXP: Powerful, rapid computer algorithms for structure-based drug design , 1997, J. Comput. Aided Mol. Des..

[33]  Thierry Langer,et al.  Comparative Performance Assessment of the Conformational Model Generators Omega and Catalyst: A Large-Scale Survey on the Retrieval of Protein-Bound Ligand Conformations , 2006, J. Chem. Inf. Model..

[34]  Jonas Boström,et al.  Reproducing the conformations of protein-bound ligands: A critical evaluation of several popular conformational searching tools , 2001, J. Comput. Aided Mol. Des..

[35]  Didier Rognan,et al.  sc-PDB: a 3D-database of ligandable binding sites—10 years on , 2014, Nucleic Acids Res..

[36]  Xicheng Wang,et al.  Bioactive conformational generation of small molecules: A comparative analysis between force-field and multiple empirical criteria based methods , 2010, BMC Bioinformatics.

[37]  Christof H. Schwab,et al.  Conformations and 3D pharmacophore searching. , 2010, Drug discovery today. Technologies.

[38]  Benjamin A. Ellingson,et al.  Conformer Generation with OMEGA: Algorithm and Validation Using High Quality Structures from the Protein Databank and Cambridge Structural Database , 2010, J. Chem. Inf. Model..

[39]  Randy J. Read,et al.  A New Generation of Crystallographic Validation Tools for the Protein Data Bank , 2011, Structure.

[40]  J M Blaney,et al.  A geometric approach to macromolecule-ligand interactions. , 1982, Journal of molecular biology.

[41]  Anthony Nicholls,et al.  Essential considerations for using protein-ligand structures in drug discovery. , 2012, Drug discovery today.

[42]  Anita R. Maguire,et al.  Confab - Systematic generation of diverse low-energy conformers , 2011, J. Cheminformatics.

[43]  J. S. Dixon,et al.  Distance Geometry in Molecular Modeling , 2007 .

[44]  Jie Li,et al.  Comparative Assessment of Scoring Functions on an Updated Benchmark: 1. Compilation of the Test Set , 2014, J. Chem. Inf. Model..

[45]  Nicolas Foloppe,et al.  Conformational Sampling of Druglike Molecules with MOE and Catalyst: Implications for Pharmacophore Modeling and Virtual Screening , 2008, J. Chem. Inf. Model..

[46]  Zukang Feng,et al.  Ligand Depot: a data warehouse for ligands bound to macromolecules , 2004, Bioinform..

[47]  Jiabo Li,et al.  CAESAR: A New Conformer Generation Algorithm Based on Recursive Buildup and Local Rotational Symmetry Consideration , 2007, J. Chem. Inf. Model..

[48]  Andrew Smellie,et al.  Poling: Promoting conformational variation , 1995, J. Comput. Chem..

[49]  Jacques Chomilier,et al.  Frog: a FRee Online druG 3D conformation generator , 2007, Nucleic Acids Res..