CSAR Data Set Release 2012: Ligands, Affinities, Complexes, and Docking Decoys

A major goal in drug design is the improvement of computational methods for docking and scoring. The Community Structure Activity Resource (CSAR) has collected several data sets from industry and added in-house data sets that may be used for this purpose (www.csardock.org). CSAR has currently obtained data from Abbott, GlaxoSmithKline, and Vertex and is working on obtaining data from several others. Combined with our in-house projects, we are providing a data set consisting of 6 protein targets, 647 compounds with biological affinities, and 82 crystal structures. Multiple congeneric series are available for several targets with a few representative crystal structures of each of the series. These series generally contain a few inactive compounds, usually not available in the literature, to provide an upper bound to the affinity range. The affinity ranges are typically 3–4 orders of magnitude per series. For our in-house projects, we have had compounds synthesized for biological testing. Affinities were measured by Thermofluor, Octet RED, and isothermal titration calorimetry for the most soluble. This allows the direct comparison of the biological affinities for those compounds, providing a measure of the variance in the experimental affinity. It appears that there can be considerable variance in the absolute value of the affinity, making the prediction of the absolute value ill-defined. However, the relative rankings within the methods are much better, and this fits with the observation that predicting relative ranking is a more tractable problem computationally. For those in-house compounds, we also have measured the following physical properties: logD, logP, thermodynamic solubility, and pKa. This data set also provides a substantial decoy set for each target consisting of diverse conformations covering the entire active site for all of the 58 CSAR-quality crystal structures. The CSAR data sets (CSAR-NRC HiQ and the 2012 release) provide substantial, publically available, curated data sets for use in parametrizing and validating docking and scoring methods.

[1]  T. A. Jones,et al.  The Uppsala Electron-Density Server. , 2004, Acta crystallographica. Section D, Biological crystallography.

[2]  Richard D. Smith,et al.  CSAR Benchmark Exercise of 2010: Combined Evaluation Across All Submitted Scoring Functions , 2011, J. Chem. Inf. Model..

[3]  William L. Jorgensen,et al.  Journal of Chemical Information and Modeling , 2005, J. Chem. Inf. Model..

[4]  Jie Luo,et al.  Retrieval of Crystallographically-Derived Molecular Geometry Information , 2004, J. Chem. Inf. Model..

[5]  Xiaoqin Zou,et al.  Construction and Test of Ligand Decoy Sets Using MDock: Community Structure-Activity Resource Benchmarks for Binding Mode Prediction , 2011, J. Chem. Inf. Model..

[6]  Walter S. Woltosz If we designed airplanes like we design drugs… , 2011, Journal of Computer-Aided Molecular Design.

[7]  I. Kuntz,et al.  Using shape complementarity as an initial screen in designing ligands for a receptor binding site of known three-dimensional structure. , 1988, Journal of medicinal chemistry.

[8]  Brian K. Shoichet,et al.  Molecular docking using shape descriptors , 1992 .

[9]  Paul N. Mortenson,et al.  Diverse, high-quality test set for the validation of protein-ligand docking performance. , 2007, Journal of medicinal chemistry.

[10]  Randy J. Read,et al.  A New Generation of Crystallographic Validation Tools for the Protein Data Bank , 2011, Structure.

[11]  Scott P. Brown,et al.  Healthy skepticism: assessing realistic model performance. , 2009, Drug discovery today.

[12]  John P. Overington,et al.  ChEMBL: a large-scale bioactivity database for drug discovery , 2011, Nucleic Acids Res..

[13]  Renato Zenobi,et al.  Label‐free determination of protein–ligand binding constants using mass spectrometry and validation using surface plasmon resonance and isothermal titration calorimetry , 2009, Journal of molecular recognition : JMR.

[14]  Paul Labute,et al.  Variability in docking success rates due to dataset preparation , 2012, Journal of Computer-Aided Molecular Design.

[15]  I. Kuntz,et al.  DOCK 6: combining techniques to model RNA-small molecule complexes. , 2009, RNA.

[16]  Matthew P. Repasky,et al.  Extra precision glide: docking and scoring incorporating a model of hydrophobic enclosure for protein-ligand complexes. , 2006, Journal of medicinal chemistry.

[17]  Anthony Nicholls,et al.  Essential considerations for using protein-ligand structures in drug discovery. , 2012, Drug discovery today.

[18]  P Willett,et al.  Development and validation of a genetic algorithm for flexible docking. , 1997, Journal of molecular biology.

[19]  Richard D. Smith,et al.  CSAR Benchmark Exercise of 2010: Selection of the Protein–Ligand Complexes , 2011, J. Chem. Inf. Model..

[20]  Jeff Blaney,et al.  A very short history of structure-based design: how did we get here and where do we need to go? , 2011, Journal of Computer-Aided Molecular Design.

[21]  Irene Nobeli,et al.  SERAPhiC: A Benchmark for in Silico Fragment-Based Drug Design , 2011, J. Chem. Inf. Model..

[22]  Joseph D. Kwasnoski,et al.  High-density miniaturized thermal shift assays as a general strategy for drug discovery. , 2001, Journal of biomolecular screening.

[23]  Victor S. Lobanov,et al.  High-Density Miniaturized Thermal Shift Assays as a General Strategy for Drug Discovery , 2001 .

[24]  Brian K Shoichet,et al.  Prediction of protein-ligand interactions. Docking and scoring: successes and gaps. , 2006, Journal of medicinal chemistry.

[25]  Irwin D. Kuntz,et al.  Development and validation of a modular, extensible docking program: DOCK 5 , 2006, J. Comput. Aided Mol. Des..

[26]  Richard D. Smith,et al.  CSAR Benchmark Exercise 2011–2012: Evaluation of Results from Docking and Relative Ranking of Blinded Congeneric Series , 2013, J. Chem. Inf. Model..

[27]  Matthew D. Segall,et al.  Can we really do computer-aided drug design? , 2011, Journal of Computer-Aided Molecular Design.

[28]  Daumantas Matulis,et al.  Thermodynamics of Aryl-Dihydroxyphenyl-Thiadiazole Binding to Human Hsp90 , 2012, PloS one.

[29]  A. Vulpetti,et al.  The experimental uncertainty of heterogeneous public K(i) data. , 2012, Journal of medicinal chemistry.

[30]  Todd J. A. Ewing,et al.  DOCK 4.0: Search strategies for automated molecular docking of flexible molecule databases , 2001, J. Comput. Aided Mol. Des..

[31]  Woody Sherman,et al.  Improving the Prediction of Absolute Solvation Free Energies Using the Next Generation OPLS Force Field. , 2012, Journal of chemical theory and computation.

[32]  Xin Wen,et al.  BindingDB: a web-accessible database of experimentally determined protein–ligand binding affinities , 2006, Nucleic Acids Res..

[33]  D. S. Sivia,et al.  Data Analysis , 1996, Encyclopedia of Evolutionary Psychological Science.

[34]  J. Pons,et al.  Determining kinetics and affinities of protein interactions using a parallel real-time label-free biosensor, the Octet. , 2008, Analytical biochemistry.

[35]  Rebecca L Rich,et al.  Direct comparison of binding equilibrium, thermodynamic, and rate constants determined by surface‐ and solution‐based biophysical methods , 2002, Protein science : a publication of the Protein Society.

[36]  Terry R. Stouch,et al.  The errors of our ways: taking account of error in computer-aided drug design to build confidence intervals for our next 25 years , 2012, Journal of Computer-Aided Molecular Design.

[37]  Darren V. S. Green,et al.  Computer-aided molecular design under the SWOTlight , 2011, Journal of Computer-Aided Molecular Design.

[38]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[39]  P. Charifson,et al.  Conformational analysis of drug-like molecules bound to proteins: an extensive study of ligand reorganization upon binding. , 2004, Journal of medicinal chemistry.