A review of chemical structure retrieval systems

This paper decribes the development and current state‐of‐the‐art in computerized systems for the storage and retrieval of chemical structure information. The main types of machine‐readable structure representation — fragmentation codes, linear notations and connection tables — are described, together with the retrieval algorithms which are used to provide structure and substructure search facilities. Current research work in chemical structure retrieval includes the development of techniques for the representation and searching of the generic structures which occur in chemical patents, for searching files of three‐dimensional structures, for ranking searches designed to identify compounds structurally similar to a given query compound, and the use of parallel computers to increase the efficiency of substructure searching. Chemical structure handling techniques are also applicable in a range of application areas, including chemical reaction indexing, computer‐aided synthesis design and structure elucidation, and substructural analysis methods for the study of quantitative structure—activity relationships.

[1]  L. C. Ray,et al.  Finding Chemical Records by Digital Computers. , 1957, Science.

[2]  Stephen H. Unger,et al.  GIT—a heuristic program for testing pairs of directed line graphs for isomorphism , 1964, CACM.

[3]  H. L. Morgan The Generation of a Unique Machine Description for Chemical Structures-A Technique Developed at Chemical Abstracts Service. , 1965 .

[4]  Edward H. Sussenguth A Graph-Theoretic Algorithm for Matching Chemical Structures. , 1965 .

[5]  Charles E. Granito,et al.  Rapid Structure Searches via Permuted Chemical Line Notations. III. A Computer-Produced Index. , 1965 .

[6]  F. W. Matthews,et al.  Organic Search and Display using a Connectivity Matrix Derived from Wiswesser Notation , 1967 .

[7]  Michael F. Lynch,et al.  A Method for Generating Unique Computer Structural Representations of Stereoisomers , 1967 .

[8]  Robert A. Fairthorne Empirical hyperbolic distributions (Bradford-Zipf-Mandelbrot) for bibliometric description and prediction , 1969 .

[9]  Paul N. Craig,et al.  Eleven Years' Experience with S K & F Structure Fragment Code , 1969 .

[10]  Sigrid Rössler,et al.  The GREMAS System, an Intergral Part of the IDC System for Chemical Documentation , 1970 .

[11]  Eugene Garfield,et al.  Index Chemicus Registry System: Pragmatic Approach to Substructure Chemical Retrieval , 1970 .

[12]  Interconversion of chemical structure systems. , 1970, Chemistry in Britain.

[13]  Carlos M. Bowman,et al.  A Chemically Oriented Information Storage and Retrieval System. III. Searching a Wiswesser Line Notation File , 1970 .

[14]  Charles E. Granito,et al.  Chemical Substructure Index (CSI) A New Research Tool , 1971 .

[15]  William J. Wiswesser,et al.  Computer-Generated Substructure Codes (Bit Screens), , 1971 .

[16]  G. W. Gibson,et al.  Conversion of Wiswesser Line Notations to Ring Codes. Part I. Conversion of Ring Systems , 1972 .

[17]  Stephen R. Heller,et al.  An Application of Interactive Graphics - The Nested Retrieval of Chemical Structures. , 1972 .

[18]  John Figueras,et al.  Substructure Search by Set Reduction. , 1972 .

[19]  Michael F. Lynch,et al.  Strategic Considerations in the Design of a Screening System for Substructure Searches of Chemical Structure Files , 1973 .

[20]  Peter. Leggate,et al.  Evaluation of SDI Service Based on the Index Chemicus Registry System , 1973 .

[21]  Michael F. Lynch,et al.  Relationship between Query and Data-Base Microstructure in General Substructure Search Systems , 1973 .

[22]  Peter. Leggate,et al.  The Searching of Wiswesser Line Notations by Means of a Character-Matching Serial Search. , 1973 .

[23]  J. A. Bush,et al.  Method for relating the structure and properties of chemical compounds , 1974, Nature.

[24]  W. Todd Wipke,et al.  Computer representation and manipulation of chemical information , 1974 .

[25]  W. Todd Wipke,et al.  Simulation and evaluation of chemical synthesis. Computer representation and manipulation of stereochemistry , 1974 .

[26]  G. G. Vander Stouw,et al.  Automated Conversion of Chemical Substance Names to Atom-Bond Connection Tables , 1974 .

[27]  David Bawden,et al.  A Method of Structure-Activity Correlation Using Wiswesser Line Notation , 1975, J. Chem. Inf. Comput. Sci..

[28]  Ted G. Lewis,et al.  Hash Table Methods , 1975, CSUR.

[29]  Louis Hodes,et al.  An Efficient Design for Chemical Structure Searching. I. The Screens , 1975, J. Chem. Inf. Comput. Sci..

[30]  Gerald G. Vander Stouw,et al.  The Chemical Abstracts Service Chemical Registry System. IV. Use of the Registry System to Support the Preparation of Index Nomenclature , 1976, J. Chem. Inf. Comput. Sci..

[31]  James E. Rush Status of Notation and Topological Systems and Potential Future Trends , 1976, J. Chem. Inf. Comput. Sci..

[32]  Louis Hodes,et al.  Selection of Descriptors According to Discrimination and Redundancy. Application to Chemical Structure Searching , 1976, J. Chem. Inf. Comput. Sci..

[33]  Malcolm Bersohn,et al.  Computers and organic synthesis , 1976 .

[34]  Robert E. Stobaugh,et al.  The Chemical Abstracts Service Chemical Registry System. I. General Design , 1976, J. Chem. Inf. Comput. Sci..

[35]  L. Hodes,et al.  A statistical-heuristic methods for automated selection of drugs for screening. , 1977, Journal of medicinal chemistry.

[36]  W. T. Wipke,et al.  Computer-Assisted Organic Synthesis , 1977 .

[37]  G. Milne,et al.  A computer-based chemical information system. , 1977, Science.

[38]  Michael F. Lynch,et al.  Variety generation - A reinterpretation of Shannon's mathematical theory of communication, and its implications for information science , 1977, J. Am. Soc. Inf. Sci..

[39]  Robert E. Tarjan,et al.  Graph Algorithms in Chemical Computation , 1977 .

[40]  George W. A. Milne,et al.  An Interactive Substructure Search System , 1977, J. Chem. Inf. Comput. Sci..

[41]  Computer-Assisted Structure Elucidation , 1977 .

[42]  P. Gund Three-Dimensional Pharmacophoric Pattern Searching , 1977 .

[43]  S. Krishnan,et al.  Hash Functions for Rapid Storage and Retrieval of Chemical Structures , 1978, J. Chem. Inf. Comput. Sci..

[44]  F. Allen,et al.  The Cambridge Crystallographic Data Centre: computer-based search, retrieval, analysis and display of information , 1979 .

[45]  Peter Willett,et al.  A Screen Set Generation Algorithm , 1979, J. Chem. Inf. Comput. Sci..

[46]  G. A. Wilson,et al.  The Chemical Abstracts Service Chemical Registry System. II. Augmented Connectivity Molecular Formula , 1979, J. Chem. Inf. Comput. Sci..

[47]  Michael F. Lynch,et al.  Evaluation and implementation of topological codes for online compound search and registration , 1981, J. Chem. Inf. Comput. Sci..

[48]  Michael F. Lynch,et al.  Computer storage and retrieval of generic chemical structures in patents, 1. Introduction and general strategy , 1981, J. Chem. Inf. Comput. Sci..

[49]  Y. Martin,et al.  A practitioner's perspective of the role of quantitative structure-activity analysis in medicinal chemistry. , 1981, Journal of medicinal chemistry.

[50]  Garland R. Marshall,et al.  Three‐dimensional computer modeling as an aid to drug design , 1981 .

[51]  Peter Willett,et al.  Use of a maximum common subgraph algorithm in the automatic identification of ostensible bond changes occurring in chemical reactions , 1981, J. Chem. Inf. Comput. Sci..

[52]  Michael F. Lynch,et al.  Computer storage and retrieval of generic chemical structures in patents, 2. GENSAL, a formal language for the description of generic chemical structures , 1981, J. Chem. Inf. Comput. Sci..

[53]  Thomas R. Hagadone,et al.  Molecular substructure searching: minicomputer-based query execution , 1982, J. Chem. Inf. Comput. Sci..

[54]  D. J. Polton,et al.  Installation and operational experiences with MACCS (Molecular Access System) , 1982 .

[55]  W. Graf,et al.  The third BASIC fragment search dictionary , 1982, J. Chem. Inf. Comput. Sci..

[56]  Thomas R. Hagadone,et al.  Molecular substructure searching: computer graphics and query entry methodology , 1982, J. Chem. Inf. Comput. Sci..

[57]  Nick A. Farmer,et al.  The CAS ONLINE search system. 1. General system design and selection, generation, and use of search screens , 1983, J. Chem. Inf. Comput. Sci..

[58]  David Bawden,et al.  Computerized chemical structure-handling techniques in structure-activity studies and molecular property prediction , 1983, J. Chem. Inf. Comput. Sci..

[59]  A J Morffew Bibliography for molecular graphics , 1983 .

[60]  J. Topliss Quantitative structure-activity relationships of drugs , 1983 .

[61]  Roger Attias,et al.  DARC substructure search system: a new approach to chemical information , 1983, J. Chem. Inf. Comput. Sci..

[62]  Susan Anderson,et al.  Graphical representation of molecules and substructure-search queries in MACCStm , 1984 .

[63]  W. Todd Wipke,et al.  Rapid subgraph search using parallelism , 1984, J. Chem. Inf. Comput. Sci..

[64]  Stephen R. Lowry,et al.  Data base development and search algorithms for automated infrared spectral identification , 1985, J. Chem. Inf. Comput. Sci..

[65]  E. Corey,et al.  Computer-assisted analysis in organic synthesis. , 1985, Science.

[66]  Michael F. Lynch,et al.  Generic structure storage and retrieval , 1985, J. Chem. Inf. Comput. Sci..

[67]  R. Venkataraghavan,et al.  Atom pairs as molecular features in structure-activity studies: definition and applications , 1985, J. Chem. Inf. Comput. Sci..

[68]  Michael F. Lynch,et al.  Computer storage and retrieval of generic chemical structures in patents. 7. Parallel simulation of a relaxation algorithm for chemical substructure search , 1986, Journal of chemical information and computer sciences.

[69]  Thomas H. Pierce,et al.  Artificial intelligence applications in chemistry , 1986 .

[70]  Peter Willett,et al.  Pharmacophoric pattern matching in files of 3-D chemical structures: election of interatomic distance screens , 1986 .

[71]  P. Willett,et al.  Implementation of nonhierarchic cluster analysis methods in chemical information structure search , 1986 .

[72]  Wendy A. Warr,et al.  In-house chemical databases at imperial chemical industries , 1986 .

[73]  Peter Willett,et al.  Implementation of nearest-neighbor searching in an online chemical structure search system , 1986, J. Chem. Inf. Comput. Sci..

[74]  P. Willett,et al.  Pharmacophoric pattern matching in files of 3d chemical structures: comparison of geometric searching algorithms , 1987 .

[75]  David Bawden,et al.  Pharmacophoric pattern matching in files of 3d chemical structures: evaluation of search performance , 1987 .