Basic Overview of Chemoinformatics

There is no particular point in time that determines when chemoinformatics was founded or established. It slowly evolved from several, often quite humble beginnings. Scientists in various fields of chemistry struggled with the development of computer methods which allowed them to manage the enormous amount of chemical information and to find relationships between the structure and properties of a compound. During the 1960s some early developments appeared that led to a flurry of activities in the 1970s. This review provides a general overview of basic methods in the specific fields of chemoinformatics, from encoding chemical compounds, storing and searching data in databases, to generating and analyzing these data. In addition, the chief interconnecting points of chemoinformatics applications are highlighted including the contributions of Johann Gasteiger to this field.

[1]  Peter Willett,et al.  Searching Techniques for Databases of Three-Dimensional Chemical Structures , 2007 .

[2]  Morton E. Munk Computer-Based Structure Determination: Then and Now , 1998, J. Chem. Inf. Comput. Sci..

[3]  A. Nebel,et al.  The Integrated Gmelin Information SystemNew developments in information processing , 1992 .

[4]  T Langer,et al.  Lead optimization Pharmacophore definition and 3 D searches , 2005 .

[5]  Gertraud Griepke Chemical Databases from Springer Verlag , 1997, J. Chem. Inf. Comput. Sci..

[6]  James Dugundji,et al.  An algebraic model of constitutional chemistry as a basis for chemical computer programs , 1973 .

[7]  Lingran Chen,et al.  Reaction Classification and Knowledge Acquisition , 2008 .

[8]  Johann Gasteiger,et al.  Computer‐Assisted Reaction Prediction and Synthesis Design , 1991 .

[9]  Camille G. Wermuth Possible Alternatives to High‐Throughput Screening , 2006 .

[10]  John M. Barnard,et al.  Chemical Similarity Searching , 1998, J. Chem. Inf. Comput. Sci..

[11]  Stefan Scheer,et al.  IUCLID: An Information Management Tool for Existing Chemicals and Biocides , 2003, J. Chem. Inf. Comput. Sci..

[12]  Hideaki Sugawara,et al.  DNA Data Bank of Japan (DDBJ) in XML , 2003, Nucleic Acids Res..

[13]  Steven L Dixon,et al.  PHASE: A Novel Approach to Pharmacophore Modeling and 3D Database Searching , 2006, Chemical biology & drug design.

[14]  M C Nicklaus,et al.  Pharmacophores in drug design and discovery. , 1998, SAR and QSAR in environmental research.

[15]  F. Allen,et al.  The crystallographic information file (CIF) : a new standard archive file for crystallography , 1991 .

[16]  Henry S. Rzepa,et al.  Chemical Markup, XML, and the World-Wide Web. 3. Toward a Signed Semantic Chemical Web of Trust , 2001, J. Chem. Inf. Comput. Sci..

[17]  Gerhard Klebe,et al.  Comparison of Automatic Three-Dimensional Model Builders Using 639 X-ray Structures , 1994, J. Chem. Inf. Comput. Sci..

[18]  Chris Marshall,et al.  Implementation of the Cahn-Ingold-Prelog System for Stereochemical Perception in the LHASA Program , 1994, J. Chem. Inf. Comput. Sci..

[19]  Peter Willett,et al.  Bit-String Methods for Selective Compound Acquisition , 2000, J. Chem. Inf. Comput. Sci..

[20]  C. Steinbeck Recent developments in automated structure elucidation of natural products. , 2004, Natural product reports.

[21]  Martin A. Ott,et al.  Cheminformatics and Organic Chemistry. Computer-Assisted Synthetic Analysis , 2004 .

[22]  Johann Gasteiger,et al.  HORACE: An automatic system for the hierarchical classification of chemical reactions , 1994, Journal of chemical information and computer sciences.

[23]  B S Duncan,et al.  Approximation and visualization of large-scale motion of protein surfaces. , 1995, Journal of molecular graphics.

[24]  M. L. Connolly Solvent-accessible surfaces of proteins and nucleic acids. , 1983, Science.

[25]  E J Corey,et al.  Computer-assisted design of complex organic syntheses. , 1969, Science.

[26]  John H. Williams,et al.  Data Types , 1976, Design and Implementation of Programming Languages.

[27]  B. Rohde Representation and Manipulation of Stereochemistry , 2008 .

[28]  Frank H. Allen,et al.  The Cambridge Structural Database (CSD) , 2006 .

[29]  Doris V. Sweet,et al.  An overview of the Registry of Toxic Effects of Chemical Substances (RTECS): Critical information on chemical hazards , 1999 .

[30]  G. Fonger,et al.  Hazardous substances data bank (HSDB) as a source of environmental fate information on chemicals. , 1995, Toxicology.

[31]  J. Gasteiger A Hierarchy of Structure Representations , 2008 .

[32]  James B. Hendrickson,et al.  COGNOS: A Beilstein-Type System for Organizing Organic Reactions , 1995, J. Chem. Inf. Comput. Sci..

[33]  Antje Chang,et al.  New Developments , 2003 .

[34]  B. Lee,et al.  The interpretation of protein structures: estimation of static accessibility. , 1971, Journal of molecular biology.

[35]  Alexander von Homeyer,et al.  Databases in Biochemistry and Molecular Biology , 2008 .

[36]  Valerie J. Gillet,et al.  De Novo Molecular Design , 2000 .

[37]  Gerhard Klebe,et al.  The Docking Problem , 2008 .

[38]  A. Peter Johnson,et al.  An algorithm for the multiple common subgraph problem , 1992, Journal of chemical information and computer sciences.

[39]  H. Kipen,et al.  The National Library of Medicine's Toxicology and Environmental Health Information Program , 1997 .

[40]  D. Ferguson,et al.  QSAR and CoMFA: a perspective on the practical application to drug discovery. , 2000, Drug Design and Discovery.

[41]  Edward H. Sussenguth A Graph-Theoretic Algorithm for Matching Chemical Structures. , 1965 .

[42]  H. Kubinyi Comparative Molecular Field Analysis (CoMFA) , 2002 .

[43]  Gisbert Schneider,et al.  Computer-based de novo design of drug-like molecules , 2005, Nature Reviews Drug Discovery.

[44]  Glenn J. Myatt,et al.  Exploring Functional Group Transformations on CASREACT , 1997, J. Chem. Inf. Comput. Sci..

[45]  J. Sadowski 3D Structure Generation , 2008 .

[46]  M. Munk,et al.  Actinobolin. I. Structure of actinobolamine. , 1967, Journal of the American Chemical Society.

[47]  W. Warr High‐Throughput Chemistry , 2008 .

[48]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[49]  Toshio Fujita,et al.  Additions and Corrections -ρ-σ-πAnalysis. A Method for the Correlation of Biological Activity and Chemical Structure. , 1964 .

[50]  Norman R. Schmuff,et al.  A comparison of the MARPAT and Markush DARC software , 1991, J. Chem. Inf. Comput. Sci..

[51]  Andreas Barth SpecInfo: An Integrated Spectroscopic Information System. , 1993 .

[52]  Shu-Kun Lin,et al.  What is molecular diversity? , 2003, Molecular diversity.

[53]  Richard D. Cramer,et al.  Computer-assisted synthetic analysis for complex molecules. Methods and procedures for machine generation of synthetic intermediates , 1972 .

[54]  Jun Xu,et al.  Two‐Dimensional Structure and Substructure Searching , 2008 .

[55]  Osman F Guner The impact of pharmacophore modeling in drug design. , 2005, IDrugs : the investigational drugs journal.

[56]  P. Selzer Correlations between Chemical Structure and Infrared Spectra , 2008 .

[57]  Frank Oellien,et al.  Enhanced CACTVS Browser of the Open NCI Database , 2002, J. Chem. Inf. Comput. Sci..

[58]  Herbert Gelernter,et al.  Building and Refining a Knowledge Base for Synthetic Organic Chemistry via the Methodology of Inductive and Deductive Machine Learning. , 1991 .

[59]  G. Klebe,et al.  Approaches to the Description and Prediction of the Binding Affinity of Small-Molecule Ligands to Macromolecular Receptors , 2002 .

[60]  Christof H. Schwab,et al.  Conformational Analysis and Searching , 2008 .

[61]  Ivar Ugi,et al.  Matter Preserving Synthetic Pathways and Semi‐Empirical Computer Assisted Planning of Syntheses , 1971 .

[62]  広野 修一 Structure-based drug designのための分子動力学シミュレ-ション (特集 創薬研究とコンピュ-タ-科学(Part 1)) , 1998 .

[63]  Lingran Chen Substructure and Maximal Common Substructure Searching , 2004 .

[64]  Ulrich Rester,et al.  Dock around the Clock – Current Status of Small Molecule Docking and Scoring , 2006 .

[65]  R. Cramer,et al.  Recent advances in comparative molecular field analysis (CoMFA). , 1989, Progress in clinical and biological research.

[66]  Annette Von Scholley A relaxation algorithm for generic chemical structure screening , 1984, J. Chem. Inf. Comput. Sci..

[67]  Jun Xu,et al.  GMA: A Generic Match Algorithm for Structural Homomorphism, Isomorphism, and Maximal Common Substructure Match and Its Applications , 1996, J. Chem. Inf. Comput. Sci..

[68]  William Fisanick,et al.  The CAS Information System: Applying Scientific Knowledge and Technology for Better Information , 2008 .

[69]  C. Gregory Paris Databases of Chemical Structures , 2008 .

[70]  G J Williams,et al.  The Protein Data Bank: a computer-based archival file for macromolecular structures. , 1978, Archives of biochemistry and biophysics.

[71]  L. C. Ray,et al.  Finding Chemical Records by Digital Computers. , 1957, Science.

[72]  John Figueras,et al.  Substructure Search by Set Reduction. , 1972 .

[73]  Emmanuel Barillot,et al.  DBcat: a catalog of 500 biological databases , 2000, Nucleic Acids Res..

[74]  Patricia S. Wilson,et al.  The Chemical Abstracts Service generic chemical (Markush) structure storage and retrieval capability. 2. The MARPAT file , 1991, J. Chem. Inf. Comput. Sci..

[75]  Julian R. Ullmann,et al.  An Algorithm for Subgraph Isomorphism , 1976, J. ACM.

[76]  C. Levinthal Molecular model-building by computer. , 1966, Scientific American.

[77]  Henry S. Rzepa,et al.  Chemical Markup, XML, and the Worldwide Web. 1. Basic Principles , 1999, J. Chem. Inf. Comput. Sci..

[78]  Marc C. Nicklaus,et al.  Pharmacophore and Drug Discovery , 2008 .

[79]  John M. Barnard,et al.  Substructure searching methods: Old and new , 1993, J. Chem. Inf. Comput. Sci..

[80]  Gary,et al.  The Cambridge Structural Database (CSD) of Small Molecule Crystal Structures , 2008 .

[81]  John Shorter,et al.  Linear Free Energy Relationships (LFER) , 2002 .

[82]  Jonathan W. Essex,et al.  A review of protein-small molecule docking methods , 2002, J. Comput. Aided Mol. Des..

[83]  Alexander J. Lawson The Beilstein Database , 2008 .

[84]  Stephen Hanessian,et al.  Computer-assisted analysis and perception of stereochemical features in organic molecules using the CHIRON program , 1990, J. Chem. Inf. Comput. Sci..

[85]  Tudor I. Oprea,et al.  Chemoinformatics in drug discovery , 2005 .

[86]  Morton E. Munk,et al.  A unique computer representation for molecular structures , 1978 .

[87]  J. Gasteiger,et al.  FROM ATOMS AND BONDS TO THREE-DIMENSIONAL ATOMIC COORDINATES : AUTOMATIC MODEL BUILDERS , 1993 .

[88]  S. Krishnan,et al.  Hash Functions for Rapid Storage and Retrieval of Chemical Structures , 1978, J. Chem. Inf. Comput. Sci..

[89]  Mariette Hellenbrandt,et al.  The Inorganic Crystal Structure Database (ICSD)—Present and Future , 2004 .

[90]  Peter C. Jurs,et al.  Quantitative Structure‐Property Relationships , 2008 .

[91]  Helen Schofield,et al.  Approaches to Understanding the Searching Behavior of CrossFire Users , 2002, J. Chem. Inf. Comput. Sci..

[92]  M. Sitzmann,et al.  Computer‐Assisted Synthesis Design by WODCA (CASD) , 2008 .

[93]  C. Jochum,et al.  JCAMP-CS: A Standard Exchange Format for Chemical Structure Information in Computer-Readable Form , 1991 .

[94]  M. L. Connolly Analytical molecular surface calculation , 1983 .

[95]  M. F. Lynch,et al.  The Sheffield Generic Structures Project - A Retrospective Review , 1997 .

[96]  Thomas Engel,et al.  Chemical Information Systems and Databases , 2007 .

[97]  Johann Gasteiger,et al.  Hash codes for the identification and classification of molecular structure elements , 1994, J. Comput. Chem..

[98]  Peter Willett,et al.  Similarity Searching in Chemical Structure Databases , 2008 .

[99]  William Fisanick,et al.  The Chemical Abstract's Service generic chemical (Markush) structure storage and retrieval capability. 1. Basic concepts , 1990, J. Chem. Inf. Comput. Sci..

[100]  Ruth Nussinov,et al.  Principles of docking: An overview of search algorithms and a guide to scoring functions , 2002, Proteins.

[101]  Alexander von Homeyer,et al.  Evolutionary Algorithms and Their Applications in Chemistry , 2008 .

[102]  S. Free,et al.  A MATHEMATICAL CONTRIBUTION TO STRUCTURE-ACTIVITY STUDIES. , 1964, Journal of medicinal chemistry.

[103]  Pierre Benichou,et al.  Handling Genericity in Chemical Structures Using the Markush Darc Software , 1997, J. Chem. Inf. Comput. Sci..

[104]  G. Bergerhoff Inorganic Three-dimensional Structure Databases , 2002 .

[105]  Peter Willett,et al.  Maximum common subgraph isomorphism algorithms for the matching of chemical structures , 2002, J. Comput. Aided Mol. Des..

[106]  Christoph Steinbeck,et al.  Correlations between Chemical Structures and NMR Data , 2003 .

[107]  Andreas D. Baxevanis,et al.  The Molecular Biology Database Collection: 2003 update , 2003, Nucleic Acids Res..

[108]  G. A. Wilson,et al.  The Chemical Abstracts Service Chemical Registry System. II. Augmented Connectivity Molecular Formula , 1979, J. Chem. Inf. Comput. Sci..

[109]  Henry S. Rzepa,et al.  Chemical Markup, XML and the World-Wide Web. 2. Information Objects and the CMLDOM , 2001, J. Chem. Inf. Comput. Sci..