Introduction to Similarity Searching in Chemistry

The similarity concept and its database implementation – similarity searching, are overviewed in the context of chemoinformatics. Similarity is defined in terms of matches/overlap, dissimilarity in terms of mismatches/difference, for qualitative/quantitative characteristics. Similarity, dissimilarity and composite measures are constructed from similarity or/and dissimilarity components. Asymmetric measures are constructed by unequal weighting of dissimilarity components. Whole objects or local regions of them are compared, yielding global or local similarity. Asymmetric local similarity is obtained by treating the objects in the comparison unequally, e.g. by ignoring parts of them. Global characteristics provide overall descriptions of objects, local characteristics provide sufficient locational information for object alignment/superposition to be effected. Similar objects are likely to have similar properties – similar property principle. In chemical similarity searching, molecules, fragments of molecules, reactions, mixtures, journal articles, etc. are selected as objects of interest. The selection of characteristics and their encoding is illustrated using the atom pair and topological torsion descriptors, as well as their variants of increased fuzziness. Similarity measure selection is still very much a matter of trial and error. Standard query object specification is made easier by using query by example, multiple searches using a single query yield a highly informative hyperlinked screen, and joint queries involve more than one object. Similarity scores illustrate results from similarity searches and measures of their effectiveness. Areas of application include direct and reverse property prediction, data mining, virtual screening, diversity analysis, pharmacophore searching, ligand docking, structure elucidation, pattern matching, and signature analysis.

[1]  Ramaswamy Nilakantan,et al.  Topological torsion: a new molecular descriptor for SAR applications. Comparison with other descriptors , 1987, J. Chem. Inf. Comput. Sci..

[2]  Thomas R. Hagadone,et al.  Molecular substructure similarity searching: efficient retrieval in two-dimensional structure databases , 1992, J. Chem. Inf. Comput. Sci..

[3]  J. D. Petke Cumulative and discrete similarity analysis of electrostatic potentials and fields , 1993, J. Comput. Chem..

[4]  Johnz Willett Similarity and Clustering in Chemical Information Systems , 1987 .

[5]  W. Graham Richards,et al.  Alignment of 3D-Structures by the Method of 2D-Projections , 1999, J. Chem. Inf. Comput. Sci..

[6]  David L. Cooper,et al.  A novel approach to molecular similarity , 1989, J. Comput. Aided Mol. Des..

[7]  Asiri Nanayakkara,et al.  Similarity of atoms in molecules , 1993 .

[8]  S. L. Dixon,et al.  The hidden component of size in two-dimensional fragment descriptors: side effects on sampling in bioactive libraries. , 1999, Journal of medicinal chemistry.

[9]  K. Sen,et al.  Molecular Similarity II , 1995 .

[10]  Jordi Mestres,et al.  MIMIC: A molecular‐field matching program. Exploiting applicability of molecular similarity approaches , 1997 .

[11]  R. Brereton,et al.  Handbook of chemoinformatics: from data to knowledge, edited by Johann Gasteiger, Volumes 1–4. Wiley‐VCH, Weinheim, 2003, ISBN 3527306803, €485 , 2004 .

[12]  J. Cioslowski Electronic Wavefunctions Analysis , 2002 .

[13]  Emili Besalú,et al.  Identification of Active Molecular Sites Using Quantum-Self-Similarity Measures , 2001, J. Chem. Inf. Comput. Sci..

[14]  Darren V. S. Green,et al.  Selecting Combinatorial Libraries to Optimize Diversity and Physical Properties , 1999, J. Chem. Inf. Comput. Sci..

[15]  Nikolay Kochev,et al.  Searching Chemical Structures , 2004, Chemoinformatics.

[16]  P. Sneath Relations between chemical structure and biological activity in peptides. , 1966, Journal of theoretical biology.

[17]  Andrew C. Good,et al.  Explicit Calculation of 3D Molecular Similarity , 2002 .

[18]  Peter Willett,et al.  Structural Similarity Measures for Database Searching , 2002 .

[19]  E. Fluder,et al.  Latent semantic structure indexing (LaSSI) for defining chemical similarity. , 2001, Journal of medicinal chemistry.

[20]  Robert P. Sheridan,et al.  Chemical Similarity Using Geometric Atom Pair Descriptors , 1996, J. Chem. Inf. Comput. Sci..

[21]  Matt Challacombe,et al.  Maximum similarity orbitals for analysis of the electronic excited states , 1991 .

[22]  R D Hull,et al.  Mining the chemical quarry with joint chemical probes: an application of latent semantic structure indexing (LaSSI) and TOPOSIM (Dice) to chemical database mining. , 2001, Journal of medicinal chemistry.

[23]  Igor I. Baskin,et al.  Molecular Similarity. 1. Analytical Description of the Set of Graph Similarity Measures , 1998, J. Chem. Inf. Comput. Sci..

[24]  Robert P. Sheridan,et al.  Protocols for Bridging the Peptide to Nonpeptide Gap in Topological Similarity Searches , 2001, J. Chem. Inf. Comput. Sci..

[25]  Douglas J. Klein,et al.  Partial Orderings in Chemistry , 1997, J. Chem. Inf. Comput. Sci..

[26]  R D Hull,et al.  Chemical similarity searches using latent semantic structural indexing (LaSSI) and comparison to TOPOSIM. , 2001, Journal of medicinal chemistry.

[27]  Robert P. Sheridan,et al.  A Method for Visualizing Recurrent Topological Substructures in Sets of Active Molecules , 1998, J. Chem. Inf. Comput. Sci..

[28]  Xin Chen,et al.  Automated Pharmacophore Identification for Large Chemical Data Sets1 , 1999, J. Chem. Inf. Comput. Sci..

[29]  Robert P. Sheridan,et al.  The Most Common Chemical Replacements in Drug-Like Compounds , 2002, J. Chem. Inf. Comput. Sci..

[30]  Robert Ponec,et al.  A novel approach to the characterization of molecular similarity. The 2nd order similarity index , 1990 .

[31]  Pei Wang,et al.  The interpretation of fuzziness , 1996, IEEE Trans. Syst. Man Cybern. Part B.

[32]  John Bradshaw,et al.  Identification of Biological Activity Profiles Using Substructural Analysis and Genetic Algorithms , 1998, J. Chem. Inf. Comput. Sci..

[33]  G. Crippen VRI: 3D QSAR at variable resolution , 1999, Journal of Computational Chemistry.

[34]  Guenter Grethe,et al.  Similarity searching in REACCS. A new tool for the synthetic chemist , 1990, J. Chem. Inf. Comput. Sci..

[35]  P Willett,et al.  Grouping of coefficients for the calculation of inter-molecular similarity and dissimilarity using 2D fragment bit-strings. , 2002, Combinatorial chemistry & high throughput screening.

[36]  P. Surján,et al.  An observable-based interpretation of electronic wavefunctions: application to “hypervalent” molecules , 1992 .

[37]  Emili Besalú,et al.  A general survey of molecular quantum similarity , 1998 .

[38]  Matthias Rarey,et al.  Feature trees: A new molecular similarity measure based on tree matching , 1998, J. Comput. Aided Mol. Des..

[39]  Matthias Rarey,et al.  Similarity searching in large combinatorial chemistry spaces , 2001, J. Comput. Aided Mol. Des..

[40]  Douglas J. Klein,et al.  On some solved and unsolved problems of chemical graph theory , 1986 .

[41]  Marvin Johnson,et al.  Concepts and applications of molecular similarity , 1990 .

[42]  A. Szabo,et al.  Modern quantum chemistry , 1982 .

[43]  O. E. Polansky,et al.  Application of distance and similarity measures. The comparison of molecular electronic structures in arbitrary electronic states , 1979 .

[44]  W. Graham Richards,et al.  Partial Molecular Alignment via Local Structure Analysis , 2000, J. Chem. Inf. Comput. Sci..

[45]  Jerzy Cioslowski,et al.  Quantifying the Hammond postulate : intramolecular proton transfer in substituted hydrogen catecholate anions , 1991 .

[46]  Michael F. Delaney,et al.  Optimization of a similarity metric for library searching of highly compressed vapor-phase infrared spectra , 1985, J. Chem. Inf. Comput. Sci..

[47]  Ramon Carbo,et al.  How similar is a molecule to another? An electron density measure of similarity between two molecular structures , 1980 .

[48]  Alan H. Lipkus,et al.  Similarity searching on CAS Registry substances. 2. 2D structural similarity , 1994, J. Chem. Inf. Comput. Sci..

[49]  Paul G. Mezey,et al.  The holographic electron density theorem and quantum similarity measures , 1999 .

[50]  Peter Willett,et al.  Bit-String Methods for Selective Compound Acquisition , 2000, J. Chem. Inf. Comput. Sci..

[51]  Simone Santini,et al.  Similarity Measures , 1999, IEEE Trans. Pattern Anal. Mach. Intell..

[52]  Robert P. Sheridan,et al.  The Centroid Approximation for Mixtures: Calculating Similarity and Deriving Structure-Activity Relationships , 2000, J. Chem. Inf. Comput. Sci..

[53]  John M. Barnard,et al.  Chemical Similarity Searching , 1998, J. Chem. Inf. Comput. Sci..

[54]  Johannes H. Voigt,et al.  Comparison of the NCI Open Database with Seven Large Chemical Structural Databases , 2001, J. Chem. Inf. Comput. Sci..

[55]  R. Venkataraghavan,et al.  Atom pairs as molecular features in structure-activity studies: definition and applications , 1985, J. Chem. Inf. Comput. Sci..

[56]  V. Batagelj,et al.  Comparing resemblance measures , 1995 .

[57]  Robert P. Sheridan,et al.  Chemical Similarity Using Physiochemical Property Descriptors , 1996, J. Chem. Inf. Comput. Sci..

[58]  Ramon Carbó,et al.  LCAO–MO similarity measures and taxonomy† , 1987 .

[59]  A Williams Recent advances in NMR prediction and automated structure elucidation software. , 2000, Current opinion in drug discovery & development.

[60]  Eugene D. Fleischmann,et al.  Assessing molecular similarity from results of ab initio electronic structure calculations , 1991 .