PubChem3D: conformer ensemble accuracy

BackgroundPubChem is a free and publicly available resource containing substance descriptions and their associated biological activity information. PubChem3D is an extension to PubChem containing computationally-derived three-dimensional (3-D) structures of small molecules. All the tools and services that are a part of PubChem3D rely upon the quality of the 3-D conformer models. Construction of the conformer models currently available in PubChem3D involves a clustering stage to sample the conformational space spanned by the molecule. While this stage allows one to downsize the conformer models to more manageable size, it may result in a loss of the ability to reproduce experimentally determined “bioactive” conformations, for example, found for PDB ligands. This study examines the extent of this accuracy loss and considers its effect on the 3-D similarity analysis of molecules.ResultsThe conformer models consisting of up to 100,000 conformers per compound were generated for 47,123 small molecules whose structures were experimentally determined, and the conformers in each conformer model were clustered to reduce the size of the conformer model to a maximum of 500 conformers per molecule. The accuracy of the conformer models before and after clustering was evaluated using five different measures: root-mean-square distance (RMSD), shape-optimized shape-Tanimoto (STST-opt) and combo-Tanimoto (ComboTST-opt), and color-optimized color-Tanimoto (CTCT-opt) and combo-Tanimoto (ComboTCT-opt). On average, the effect of clustering decreased the conformer model accuracy, increasing the conformer ensemble’s RMSD to the bioactive conformer (by 0.18 ± 0.12 Å), and decreasing the STST-opt, ComboTST-opt, CTCT-opt, and ComboTCT-opt scores (by 0.04 ± 0.03, 0.16 ± 0.09, 0.09 ± 0.05, and 0.15 ± 0.09, respectively).ConclusionThis study shows the RMSD accuracy performance of the PubChem3D conformer models is operating as designed. In addition, the effect of PubChem3D sampling on 3-D similarity measures shows that there is a linear degradation of average accuracy with respect to molecular size and flexibility. Generally speaking, one can likely expect the worst-case minimum accuracy of 90% or more of the PubChem3D ensembles to be 0.75, 1.09, 0.43, and 1.13, in terms of STST-opt, ComboTST-opt, CTCT-opt, and ComboTCT-opt, respectively. This expected accuracy improves linearly as the molecule becomes smaller or less flexible.

[1]  Tad Hurst,et al.  Flexible 3D searching: The directed tweak technique , 1994, J. Chem. Inf. Comput. Sci..

[2]  Jonas Boström,et al.  Reproducing the conformations of protein-bound ligands: A critical evaluation of several popular conformational searching tools , 2001, J. Comput. Aided Mol. Des..

[3]  T. Halgren MMFF VI. MMFF94s option for energy minimization studies , 1999, J. Comput. Chem..

[4]  Benjamin A. Ellingson,et al.  Conformer Generation with OMEGA: Algorithm and Validation Using High Quality Structures from the Protein Databank and Cambridge Structural Database , 2010, J. Chem. Inf. Model..

[5]  Robin Taylor,et al.  A new test set for validating predictions of protein–ligand interaction , 2002, Proteins.

[6]  D. Cruickshank,et al.  Remarks about protein structure precision. , 1999, Acta crystallographica. Section D, Biological crystallography.

[7]  T. Halgren Merck molecular force field. I. Basis, form, scope, parameterization, and performance of MMFF94 , 1996, J. Comput. Chem..

[8]  Evan Bolton,et al.  Effects of multiple conformers per compound upon 3-D similarity search and bioassay data analysis , 2012, Journal of Cheminformatics.

[9]  Xin Chen,et al.  Performance of Similarity Measures in 2D Fragment-Based Similarity Searching: Comparison of Structural Descriptors and Similarity Coefficients , 2002, J. Chem. Inf. Comput. Sci..

[10]  Jonas Boström,et al.  Assessing the performance of OMEGA with respect to retrieving bioactive conformations. , 2003, Journal of molecular graphics & modelling.

[11]  D. Blow,et al.  Rearrangement of Cruickshank's formulae for the diffraction-component precision index. , 2002, Acta crystallographica. Section D, Biological crystallography.

[12]  Evan Bolton,et al.  PubChem3D: Diversity of shape , 2011, J. Cheminformatics.

[13]  Yanli Wang,et al.  MMDB: 3D structures and macromolecular interactions , 2011, Nucleic Acids Res..

[14]  Yanli Wang,et al.  MMDB: Entrez's 3D-structure database , 2003, Nucleic Acids Res..

[15]  Noriaki Hirayama,et al.  Ph4Dock: pharmacophore-based protein-ligand docking. , 2004, Journal of medicinal chemistry.

[16]  Gerhard Klebe,et al.  A fast and efficient method to generate biologically relevant conformations , 1994, J. Comput. Aided Mol. Des..

[17]  Evan Bolton,et al.  PubChem3D: a new resource for scientists , 2011, J. Cheminformatics.

[18]  E. Keith Davies,et al.  Conformational Freedom in 3-D Databases , 1993 .

[19]  Evan Bolton,et al.  PubChem3D: Biologically relevant 3-D similarity , 2011, J. Cheminformatics.

[20]  Thomas A. Halgren MMFF VI. MMFF94s option for energy minimization studies , 1999, J. Comput. Chem..

[21]  J. A. Grant,et al.  A shape-based 3-D scaffold hopping method and its application to a bacterial protein-protein interaction. , 2005, Journal of medicinal chemistry.

[22]  Thomas A. Halgren Merck molecular force field. I. Basis, form, scope, parameterization, and performance of MMFF94 , 1996, J. Comput. Chem..

[23]  Anthony Nicholls,et al.  Conformer Generation with OMEGA: Learning from the Data Set and the Analysis of Failures , 2012, J. Chem. Inf. Model..

[24]  Evan Bolton,et al.  PubChem3D: Shape compatibility filtering using molecular shape quadrupoles , 2011, J. Cheminformatics.

[25]  Yanli Wang,et al.  PubChem: Integrated Platform of Small Molecules and Biological Activities , 2008 .

[26]  J. Zou,et al.  Improved methods for building protein models in electron density maps and the location of errors in these models. , 1991, Acta crystallographica. Section A, Foundations of crystallography.

[27]  Zukang Feng,et al.  Validation of protein structures for protein data bank. , 2003, Methods in enzymology.

[28]  Naomie Salim,et al.  Analysis and Display of the Size Dependence of Chemical Similarity Coefficients , 2003, J. Chem. Inf. Comput. Sci..

[29]  M C Nicklaus,et al.  Conformational changes of small molecules binding to proteins. , 1995, Bioorganic & medicinal chemistry.

[30]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[31]  Gregory D. Schuler,et al.  Database resources of the National Center for Biotechnology Information: update , 2004, Nucleic acids research.

[32]  Steven L. Teig,et al.  Chemical Function Queries for 3D Database Search , 1994, J. Chem. Inf. Comput. Sci..

[33]  J. Andrew Grant,et al.  A fast method of molecular shape comparison: A simple application of a Gaussian description of molecular shape , 1996, J. Comput. Chem..

[34]  G. Murshudov,et al.  Refinement of macromolecular structures by the maximum-likelihood method. , 1997, Acta crystallographica. Section D, Biological crystallography.

[35]  Yanli Wang,et al.  MMDB: annotating protein sequences with Entrez's 3D-structure database , 2006, Nucleic Acids Res..

[36]  Andreas Bender,et al.  Recognizing Pitfalls in Virtual Screening: A Critical Review , 2012, J. Chem. Inf. Model..

[37]  Evan Bolton,et al.  PubChem3D: Conformer generation , 2011, J. Cheminformatics.

[38]  Yanli Wang,et al.  PubChem: a public information system for analyzing bioactivities of small molecules , 2009, Nucleic Acids Res..

[39]  N. W. Murrall,et al.  Conformational freedom in 3-D databases. 1. Techniques , 1990, J. Chem. Inf. Comput. Sci..

[40]  Evan Bolton,et al.  Assessment of Conformational Ensemble Sizes Necessary for Specific Resolutions of Coverage of Conformational Space , 2007, J. Chem. Inf. Model..

[41]  Evan Bolton,et al.  PubChem3D: Similar conformers , 2011, J. Cheminformatics.

[42]  K Ravi Acharya,et al.  The advantages and limitations of protein crystal structures. , 2005, Trends in pharmacological sciences.

[43]  Cruickshank,et al.  Remarks about protein structure precision. erratum , 1999, Acta crystallographica. Section D, Biological crystallography.

[44]  T. A. Jones,et al.  The Uppsala Electron-Density Server. , 2004, Acta crystallographica. Section D, Biological crystallography.

[45]  John B. Anderson,et al.  MMDB: Entrez's 3D-structure database , 2002, Nucleic Acids Res..

[46]  Roger A. Sayle,et al.  Lingos, Finite State Machines, and Fast Similarity Searching , 2006, J. Chem. Inf. Model..

[47]  Evan Bolton,et al.  An overview of the PubChem BioAssay resource , 2009, Nucleic Acids Res..

[48]  Johann Gasteiger,et al.  Impact of Conformational Flexibility on Three-Dimensional Similarity Searching Using Correlation Vectors , 2006, J. Chem. Inf. Model..