Chemical Similarity Searching

This paper reviews the use of similarity searching in chemical databases. It begins by introducing the concept of similarity searching, differentiating it from the more common substructure searching, and then discusses the current generation of fragment-based measures that are used for searching chemical structure databases. The next sections focus upon two of the principal characteristics of a similarity measure:  the coefficient that is used to quantify the degree of structural resemblance between pairs of molecules and the structural representations that are used to characterize molecules that are being compared in a similarity calculation. New types of similarity measure are then compared with current approaches, and examples are given of several applications that are related to similarity searching.

[1]  Ramaswamy Nilakantan,et al.  Topological torsion: a new molecular descriptor for SAR applications. Comparison with other descriptors , 1987, J. Chem. Inf. Comput. Sci..

[2]  Robert P. Sheridan,et al.  Chemical Similarity Using Physiochemical Property Descriptors , 1996, J. Chem. Inf. Comput. Sci..

[3]  John G. Topliss,et al.  CHANCE FACTORS IN QSAR STUDIES , 1980 .

[4]  M. Lajiness Dissimilarity-based compound selection techniques , 1996 .

[5]  Irwin D. Kuntz,et al.  A fast and efficient method for 2D and 3D molecular shape description , 1992, J. Comput. Aided Mol. Des..

[6]  A. Good,et al.  The calculation of molecular similarity: alternative formulas, data manipulation and graphical display. , 1992, Journal of molecular graphics.

[7]  Fionn Murtagh,et al.  Search algorithms for numeric and quantitative data , 1993 .

[8]  John C. Gower,et al.  Measures of Similarity, Dissimilarity and Distance , 1985 .

[9]  P. E. Jones,et al.  A framework for comparing term association measures , 1967 .

[10]  J. D. Petke Cumulative and discrete similarity analysis of electrostatic potentials and fields , 1993, J. Comput. Chem..

[11]  Yvonne C. Martin,et al.  Use of Structure-Activity Data To Compare Structure-Based Clustering Methods and Descriptors for Use in Compound Selection , 1996, J. Chem. Inf. Comput. Sci..

[12]  Peter Willett,et al.  Similarity Searching and Clustering of Chemical-Structure Databases Using Molecular Property Data , 1994, J. Chem. Inf. Comput. Sci..

[13]  Fionn Murtagh,et al.  Intelligent information retrieval: The case of astronomy and related space sciences , 1993 .

[14]  Michael F. Lynch,et al.  Computer storage and retrieval of generic chemical structures in patents. 10. Assignment and logical bubble-up of ring screens for structurally explicit generics , 1989, J. Chem. Inf. Comput. Sci..

[15]  George W. Adamson,et al.  A Comparison of the Performance of Some Similarity and Dissimilarity Measures in the Automatic Classification of Chemical Structures , 1975, J. Chem. Inf. Comput. Sci..

[16]  Thomas L. Isenhour,et al.  ARTS: a flexible laboratory instrument control language , 1987, J. Chem. Inf. Comput. Sci..

[17]  J. H. Ward Hierarchical Grouping to Optimize an Objective Function , 1963 .

[18]  Peter Willett,et al.  Similarity searching in files of three-dimensional chemical structures: Comparison of fragment-based measures of shape similarity , 1994, J. Chem. Inf. Comput. Sci..

[19]  Peter Willett,et al.  Techniques for the calculation of three-dimensional structural similarity using inter-atomic distances , 1991, J. Comput. Aided Mol. Des..

[20]  W. Graham Richards,et al.  The Utilization of Reduced Dimensional Representations of Molecular Structure for Rapid Molecular Similarity Calculations , 1997, J. Chem. Inf. Comput. Sci..

[21]  Fionn Murtagh,et al.  Multidimensional clustering algorithms , 1985 .

[22]  Ronald C. Read A new system for the designation of chemical compounds. 2. Coding of cyclic compounds , 1985, J. Chem. Inf. Comput. Sci..

[23]  David E. Clark,et al.  A comparison of heuristic search algorithms for molecular docking , 1997, J. Comput. Aided Mol. Des..

[24]  William Fisanick,et al.  Experimental system for similarity and 3D searching of CAS registry substances. 1. 3D substructure searching , 1993, J. Chem. Inf. Comput. Sci..

[25]  Thomas R. Hagadone,et al.  Molecular substructure similarity searching: efficient retrieval in two-dimensional structure databases , 1992, J. Chem. Inf. Comput. Sci..

[26]  Ramaswamy Nilakantan,et al.  New method for rapid characterization of molecular shapes: applications in drug design , 1993, J. Chem. Inf. Comput. Sci..

[27]  Thomas Lengauer,et al.  A fast flexible docking method using an incremental construction algorithm. , 1996, Journal of molecular biology.

[28]  Peter Willett,et al.  Designing bioactive molecules : three-dimensional techniques and applications , 1998 .

[29]  W. Douglas Hounshell,et al.  Similarity Searching in the Development of New Bioactive Compounds. An Application. , 1993 .

[30]  Robert C. Kohberger,et al.  Cluster Analysis (3rd ed.) , 1994 .

[31]  Peter Willett,et al.  Implementation of nearest-neighbor searching in an online chemical structure search system , 1986, J. Chem. Inf. Comput. Sci..

[32]  Alan H. Lipkus,et al.  Similarity searching on CAS Registry substances. 2. 2D structural similarity , 1994, J. Chem. Inf. Comput. Sci..

[33]  Robert P. Sheridan,et al.  FLOG: A system to select ‘quasi-flexible’ ligands complementary to a receptor of known three-dimensional structure , 1994, J. Comput. Aided Mol. Des..

[34]  Catherine Burt,et al.  A Linear Molecular Similarity Index , 1992 .

[35]  Robert D. Clark,et al.  Virtual Compound Libraries: A New Approach to Decision Making in Molecular Discovery Research , 1998, J. Chem. Inf. Comput. Sci..

[36]  Michael F. Lynch,et al.  Strategic Considerations in the Design of a Screening System for Substructure Searches of Chemical Structure Files , 1973 .

[37]  Ramon Carbo,et al.  How similar is a molecule to another? An electron density measure of similarity between two molecular structures , 1980 .

[38]  J. E. Gordon,et al.  Chemical inference. 2. Formalization of the language of organic chemistry: generic systematic nomenclature , 1984, Journal of chemical information and computer sciences.

[39]  Valerie J. Gillet,et al.  Computer storage and retrieval of generic chemical structures in patents. 8. Reduced chemical graphs and their applications in generic chemical structure retrieval , 1987, J. Chem. Inf. Comput. Sci..

[40]  J. Scott Dixon,et al.  A good ligand is hard to find: Automated docking methods , 1993 .

[41]  R. Venkataraghavan,et al.  Atom pairs as molecular features in structure-activity studies: definition and applications , 1985, J. Chem. Inf. Comput. Sci..

[42]  Guenter Grethe,et al.  Similarity searching in REACCS. A new tool for the synthetic chemist , 1990, J. Chem. Inf. Comput. Sci..

[43]  Yvonne C. Martin,et al.  The Information Content of 2D and 3D Structural Descriptors Relevant to Ligand-Receptor Binding , 1997, J. Chem. Inf. Comput. Sci..

[44]  Ray A. Jarvis,et al.  Clustering Using a Similarity Measure Based on Shared Near Neighbors , 1973, IEEE Transactions on Computers.

[45]  P Willett,et al.  Searching for pharmacophoric patterns in databases of three‐dimensional chemical structures , 1995, Journal of molecular recognition : JMR.

[46]  P. Willett,et al.  A Comparison of Some Measures for the Determination of Inter‐Molecular Structural Similarity Measures of Inter‐Molecular Structural Similarity , 1986 .

[47]  Peter Willett,et al.  Similarity Searching in Files of Three-Dimensional Chemical Structures: Flexible Field-Based Searching of Molecular Electrostatic Potentials , 1996, J. Chem. Inf. Comput. Sci..

[48]  R. Whistler,et al.  Preparation and antitumor activity of 4'-thio analogs of 2,2'-anhydro-1-beta-D-arabinofuranosylcytosine. , 1974, Journal of medicinal chemistry.

[49]  Dennis R. Drewes Computer code for producing Eh-pH plots of equilibrium chemical systems , 1985, J. Chem. Inf. Comput. Sci..

[50]  William Fisanick,et al.  The Chemical Abstract's Service generic chemical (Markush) structure storage and retrieval capability. 1. Basic concepts , 1990, J. Chem. Inf. Comput. Sci..

[51]  P.-L. Chau,et al.  Molecular recognition: blind-searching for regions of strong structural match on the surfaces of two dissimilar molecules , 1988 .

[52]  Yoshimasa Takahashi,et al.  Automatic identification of molecular similarity using reduced-graph representation of chemical structure , 1992, J. Chem. Inf. Comput. Sci..

[53]  Peter Willett,et al.  Similarity Searching in Files of Three-Dimensional Chemical Structures: Evaluation of the EVA Descriptor and Combination of Rankings Using Data Fusion. , 1997 .

[54]  Ajay,et al.  Computational methods to predict binding free energy in ligand-receptor complexes. , 1995, Journal of medicinal chemistry.

[55]  I. Kuntz Structure-Based Strategies for Drug Design and Discovery , 1992, Science.

[56]  Peter Willett,et al.  An algorithm for chemical superstructure searching , 1985, J. Chem. Inf. Comput. Sci..

[57]  Philip M. Dean,et al.  Molecular surface-volume and property matching to superpose flexible dissimilar molecules , 1995, J. Comput. Aided Mol. Des..

[58]  Peter Willett,et al.  Measuring the degree of similarity between objects in text retrieval systems , 1993 .

[59]  Peter Willett,et al.  Similarity Searching in Files of Three-Dimensional Structures: Evaluation of Similarity Coefficients and Standardisation Methods for Field-Based Similarity Searching , 1995 .

[60]  P. Willett,et al.  Implementation of nonhierarchic cluster analysis methods in chemical information structure search , 1986 .

[61]  Cheng Cheng,et al.  Four Association Coefficients for Relating Molecular Similarity Measures , 1996, J. Chem. Inf. Comput. Sci..

[62]  Vincent J. van Geerestein,et al.  Database searching on the basis of three-dimensional molecular similarity using the SPERM program , 1992, J. Chem. Inf. Comput. Sci..

[63]  John M. Barnard,et al.  Clustering of chemical structures on the basis of two-dimensional similarity measures , 1992, J. Chem. Inf. Comput. Sci..

[64]  Peter Willett,et al.  Effect of standardization on fragment‐based measures of structural similarity , 1993 .

[65]  P. Sopp Cluster analysis. , 1996, Veterinary immunology and immunopathology.

[66]  Johann Gasteiger,et al.  Computer‐Assisted Planning of Organic Syntheses: The Second Generation of Programs , 1996 .

[67]  Roger Attias,et al.  DARC substructure search system: a new approach to chemical information , 1983, J. Chem. Inf. Comput. Sci..

[68]  P Willett,et al.  Development and validation of a genetic algorithm for flexible docking. , 1997, Journal of molecular biology.

[69]  John M. Barnard,et al.  Substructure searching methods: Old and new , 1993, J. Chem. Inf. Comput. Sci..

[70]  M. Lawrence,et al.  CLIX: A search algorithm for finding novel ligands capable of binding proteins of known three‐dimensional structure , 1992, Proteins.

[71]  P. Willett,et al.  A Fast Algorithm For Selecting Sets Of Dissimilar Molecules From Large Chemical Databases , 1995 .

[72]  Subhash C. Basak,et al.  Determining structural similarity of chemicals using graph-theoretic indices , 1988, Discret. Appl. Math..

[73]  R. Cramer,et al.  Comparative molecular field analysis (CoMFA). 1. Effect of shape on binding of steroids to carrier proteins. , 1988, Journal of the American Chemical Society.

[74]  時實 象一 Computer storage and retrieval of generic chemical structures , 1987 .

[75]  Robert P. Sheridan,et al.  Chemical Similarity Using Geometric Atom Pair Descriptors , 1996, J. Chem. Inf. Comput. Sci..

[76]  D. L. Hall,et al.  Mathematical Techniques in Multisensor Data Fusion , 1992 .

[77]  R. Sokal,et al.  Principles of numerical taxonomy , 1965 .

[78]  Andreas Barth Status and Future Developments of Reaction Databases and Online Retrieval Systems. , 1991 .

[79]  J M Blaney,et al.  A geometric approach to macromolecule-ligand interactions. , 1982, Journal of molecular biology.

[80]  William Fisanick,et al.  Similarity searching on CAS Registry substances. 1. Global molecular property and generic atom triangle geometric searching , 1992, J. Chem. Inf. Comput. Sci..

[81]  Peter Willett,et al.  Implementation of nonhierarchic cluster analysis methods in chemical information systems: selection of compounds for biological testing and clustering of substructure search output , 1986, Journal of chemical information and computer sciences.

[82]  Marvin Johnson,et al.  Concepts and applications of molecular similarity , 1990 .

[83]  W. Todd Wipke,et al.  Artificial intelligence in organic synthesis. SST: starting material selection strategies. An application of superstructure search , 1984, J. Chem. Inf. Comput. Sci..

[84]  A. Tversky Features of Similarity , 1977 .

[85]  Andreas Zell,et al.  Locating Biologically Active Compounds in Medium-Sized Heterogeneous Datasets by Topological Autocorrelation Vectors: Dopamine and Benzodiazepine Agonists , 1996, J. Chem. Inf. Comput. Sci..

[86]  Nick A. Farmer,et al.  The CAS ONLINE search system. 1. General system design and selection, generation, and use of search screens , 1983, J. Chem. Inf. Comput. Sci..

[87]  Johnz Willett Similarity and Clustering in Chemical Information Systems , 1987 .

[88]  Z. Hubálek COEFFICIENTS OF ASSOCIATION AND SIMILARITY, BASED ON BINARY (PRESENCE‐ABSENCE) DATA: AN EVALUATION , 1982 .

[89]  Peter Willett,et al.  Similarity Searching in Files of Three-Dimensional Chemical Structures: Evaluation of the EVA Descriptor and Combination of Rankings Using Data Fusion , 1997, J. Chem. Inf. Comput. Sci..

[90]  C E Berkoff,et al.  Substructural analysis. A novel approach to the problem of drug design. , 1974, Journal of medicinal chemistry.

[91]  Peter Willett,et al.  Modern approaches to chemical reaction searching : proceedings of a conference , 1986 .

[92]  Michael F. Lynch,et al.  Computer storage and retrieval of generic chemical structures in patents. 7. Parallel simulation of a relaxation algorithm for chemical substructure search , 1986, Journal of chemical information and computer sciences.

[93]  Peter Willett,et al.  Implementation and use of an atom-mapping procedure for similarity searching in databases of 3-D chemical structures , 1990 .

[94]  Chris Marshall,et al.  Starting material oriented retrosynthetic analysis in the LHASA program. 2. Mapping the SM and target structures , 1992, J. Chem. Inf. Comput. Sci..

[95]  Thompson N. Doman,et al.  Algorithm5: A Technique for Fuzzy Similarity Clustering of Chemical Inventories , 1996, J. Chem. Inf. Comput. Sci..

[96]  Robert D. Brown Descriptors for diversity analysis , 1996 .

[97]  Ramon Carbó-Dorca,et al.  Quantum similarity measures, molecular cloud description, and structure-properties relationships , 1992, J. Chem. Inf. Comput. Sci..