ChemDB: a public database of small molecules and related chemoinformatics resources

MOTIVATION The development of chemoinformatics has been hampered by the lack of large, publicly available, comprehensive repositories of molecules, in particular of small molecules. Small molecules play a fundamental role in organic chemistry and biology. They can be used as combinatorial building blocks for chemical synthesis, as molecular probes in chemical genomics and systems biology, and for the screening and discovery of new drugs and other useful compounds. RESULTS We describe ChemDB, a public database of small molecules available on the Web. ChemDB is built using the digital catalogs of over a hundred vendors and other public sources and is annotated with information derived from these sources as well as from computational methods, such as predicted solubility and three-dimensional structure. It supports multiple molecular formats and is periodically updated, automatically whenever possible. The current version of the database contains approximately 4.1 million commercially available compounds and 8.2 million counting isomers. The database includes a user-friendly graphical interface, chemical reactions capabilities, as well as unique search capabilities. AVAILABILITY Database and datasets are available on http://cdb.ics.uci.edu.

[1]  A. Tversky Features of Similarity , 1977 .

[2]  David Weininger,et al.  SMILES. 2. Algorithm for generation of unique SMILES notation , 1989, J. Chem. Inf. Comput. Sci..

[3]  Dennis H. Rouvray,et al.  Definition and role of similarity concepts in the chemical and physical sciences , 1992, J. Chem. Inf. Comput. Sci..

[4]  Gerhard Klebe,et al.  Comparison of Automatic Three-Dimensional Model Builders Using 639 X-ray Structures , 1994, J. Chem. Inf. Comput. Sci..

[5]  Johann Gasteiger,et al.  Chemical Information in 3D Space , 1996, J. Chem. Inf. Comput. Sci..

[6]  Susan M. Drake A Novel Approach. , 1996 .

[7]  Darren R. Flower,et al.  On the Properties of Bit String-Based Measures of Chemical Similarity , 1998, J. Chem. Inf. Comput. Sci..

[8]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[9]  R A Houghten,et al.  Parallel array and mixture-based synthetic combinatorial chemistry: tools for the next millennium. , 2000, Annual review of pharmacology and toxicology.

[10]  S. Schreiber,et al.  Target-oriented and diversity-oriented organic synthesis in drug discovery. , 2000, Science.

[11]  Robert Bywater,et al.  Improving the Odds in Discriminating "Drug-like" from "Non Drug-like" Compounds , 2000, J. Chem. Inf. Comput. Sci..

[12]  Johannes H. Voigt,et al.  Comparison of the NCI Open Database with Seven Large Chemical Structural Databases , 2001, J. Chem. Inf. Comput. Sci..

[13]  F. Lombardo,et al.  Experimental and computational approaches to estimate solubility and permeability in drug discovery and development settings. , 2001, Advanced drug delivery reviews.

[14]  D. Agrafiotis,et al.  Combinatorial informatics in the post-genomics era , 2002, Nature Reviews Drug Discovery.

[15]  Stephen R. Johnson,et al.  Molecular properties that influence the oral bioavailability of drug candidates. , 2002, Journal of medicinal chemistry.

[16]  Ta-Hsin Li,et al.  A Filter Bank Approach for Modeling and Forecasting Seasonal Patterns , 2002, Technometrics.

[17]  Joseph S. Verducci,et al.  A Modification of the Jaccard–Tanimoto Similarity Index for Diverse Selection of Chemical Compounds Using Binary Strings , 2002, Technometrics.

[18]  Les M. Sztandera,et al.  Soft Computing Approaches in Chemistry , 2003 .

[19]  R. Strausberg,et al.  From Knowing to Controlling: A Path from Genomics to Drugs Using Small Molecule Probes , 2003, Science.

[20]  Anthony Widjaja,et al.  Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond , 2003, IEEE Transactions on Neural Networks.

[21]  Stuart L. Schreiber,et al.  The small-molecule approach to biology , 2003 .

[22]  S. Schreiber PERSPECTIVE: THE SMALL-MOLECULE APPROACH TO BIOLOGYChemical genetics and diversity-oriented organic synthesis make possible the systematic exploration of biology , 2003 .

[23]  A. Micheli,et al.  A Novel Approach to QSPR/QSAR Based on Neural Networks for Structures , 2003 .

[24]  C. Dobson Chemical space and biology , 2004, Nature.

[25]  B. Stockwell Exploring biology with small organic molecules , 2004, Nature.

[26]  A. Hopkins,et al.  Navigating chemical space for biology and medicine , 2004, Nature.

[27]  J. Kaiser House Approves 0.5% Raise for NIH, Comments on Database , 2005, Science.

[28]  Pierre Baldi,et al.  Kernels for small molecules and the prediction of mutagenicity, toxicity and anti-cancer activity , 2005, ISMB.

[29]  Brian K. Shoichet,et al.  ZINC - A Free Database of Commercially Available Compounds for Virtual Screening , 2005, J. Chem. Inf. Model..

[30]  Søren Brunak,et al.  Prediction methods and databases within chemoinformatics : Emphasis on drugs and drug candidates , 2005 .

[31]  E. Marris Chemistry society goes head to head with NIH in fight over public database , 2005, Nature.

[32]  J. Crystal,et al.  An endocannabinoid mechanism for stress-induced analgesia , 2005, Nature.

[33]  Pierre Baldi,et al.  Graph kernels for chemical informatics , 2005, Neural Networks.

[34]  J. Kaiser Chemists Want NIH to Curtail Database , 2005, Science.