LigandBox: A database for 3D structures of chemical compounds

A database for the 3D structures of available compounds is essential for the virtual screening by molecular docking. We have developed the LigandBox database (http://ligandbox.protein.osaka-u.ac.jp/ligandbox/) containing four million available compounds, collected from the catalogues of 37 commercial suppliers, and approved drugs and biochemical compounds taken from KEGG_DRUG, KEGG_COMPOUND and PDB databases. Each chemical compound in the database has several 3D conformers with hydrogen atoms and atomic charges, which are ready to be docked into receptors using docking programs. The 3D conformations were generated using our molecular simulation program package, myPresto. Various physical properties, such as aqueous solubility (LogS) and carcinogenicity have also been calculated to characterize the ADME-Tox properties of the compounds. The Web database provides two services for compound searches: a property/chemical ID search and a chemical structure search. The chemical structure search is performed by a descriptor search and a maximum common substructure (MCS) search combination, using our program kcombu. By specifying a query chemical structure, users can find similar compounds among the millions of compounds in the database within a few minutes. Our database is expected to assist a wide range of researchers, in the fields of medical science, chemical biology, and biochemistry, who are seeking to discover active chemical compounds by the virtual screening.

[1]  Takeshi Kawabata,et al.  Build-Up Algorithm for Atomic Correspondence between Chemical Structures , 2011, J. Chem. Inf. Model..

[2]  Brian K. Shoichet,et al.  ZINC - A Free Database of Commercially Available Compounds for Virtual Screening , 2005, J. Chem. Inf. Model..

[3]  I. Kuntz,et al.  DOCK 6: combining techniques to model RNA-small molecule complexes. , 2009, RNA.

[4]  Haruki Nakamura,et al.  Quantitative analysis of aggregation-solubility relationship by in-silico solubility prediction , 2010 .

[5]  David Weininger,et al.  SMILES. 2. Algorithm for generation of unique SMILES notation , 1989, J. Chem. Inf. Comput. Sci..

[6]  Evan Bolton,et al.  PubChem3D: a new resource for scientists , 2011, J. Cheminformatics.

[7]  R. Venkataraghavan,et al.  Atom pairs as molecular features in structure-activity studies: definition and applications , 1985, J. Chem. Inf. Comput. Sci..

[8]  J. Gasteiger,et al.  FROM ATOMS AND BONDS TO THREE-DIMENSIONAL ATOMIC COORDINATES : AUTOMATIC MODEL BUILDERS , 1993 .

[9]  Susumu Goto,et al.  KEGG for integration and interpretation of large-scale molecular data sets , 2011, Nucleic Acids Res..

[10]  Richard Van Noorden Chemistry’s web of data expands , 2012, Nature.

[11]  David Weininger,et al.  SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules , 1988, J. Chem. Inf. Comput. Sci..

[12]  Yvonne C. Martin,et al.  Let’s not forget tautomers , 2009, J. Comput. Aided Mol. Des..

[13]  Haruki Nakamura,et al.  Definition of Drug-Likeness for Compound Affinity , 2011, J. Chem. Inf. Model..

[14]  John P. Overington,et al.  ChEMBL: a large-scale bioactivity database for drug discovery , 2011, Nucleic Acids Res..

[15]  Peter Ertl,et al.  Molecular structure input on the web , 2010, J. Cheminformatics.

[16]  Antony J. Williams,et al.  A perspective of publicly accessible/open-access chemistry databases. , 2008, Drug discovery today.

[17]  Junmei Wang,et al.  Development and testing of a general amber force field , 2004, J. Comput. Chem..

[18]  Stefano Forli,et al.  Virtual screening with AutoDock: theory and practice , 2010, Expert opinion on drug discovery.

[19]  Ryan G. Coleman,et al.  ZINC: A Free Tool to Discover Chemistry for Biology , 2012, J. Chem. Inf. Model..

[20]  Brian K. Shoichet,et al.  Virtual screening of chemical libraries , 2004, Nature.

[21]  J. Irwin,et al.  Docking and chemoinformatic screens for new ligands and targets. , 2009, Current opinion in biotechnology.

[22]  Akira R. Kinjo,et al.  Protein Data Bank Japan (PDBj): maintaining a structural data archive and resource description framework format , 2011, Nucleic Acids Res..

[23]  David S. Goodsell,et al.  AutoDock4 and AutoDockTools4: Automated docking with selective receptor flexibility , 2009, J. Comput. Chem..

[24]  J. Fernández-Recio,et al.  Established and emerging trends in computational drug discovery in the structural genomics era. , 2012, Chemistry & biology.

[25]  Patricia Rodriguez-Tomé,et al.  MMsINC: a large-scale chemoinformatics database , 2008, Nucleic Acids Res..

[26]  Yanli Wang,et al.  PubChem: a public information system for analyzing bioactivities of small molecules , 2009, Nucleic Acids Res..

[27]  Y. Fukunishi,et al.  Similarities among receptor pockets and among compounds: analysis and application to in silico ligand screening. , 2005, Journal of molecular graphics & modelling.

[28]  Haruki Nakamura,et al.  Advanced in-silico drug screening to achieve high hit ratio , 2009 .

[29]  Giuseppe Felice Mangiatordi,et al.  CoCoCo: a free suite of multiconformational chemical databases for high-throughput virtual screening purposes. , 2010, Molecular bioSystems.

[30]  Benjamin A. Ellingson,et al.  Conformer Generation with OMEGA: Algorithm and Validation Using High Quality Structures from the Protein Databank and Cambridge Structural Database , 2010, J. Chem. Inf. Model..