Lingos, Finite State Machines, and Fast Similarity Searching

We apply a recently published method of text-based molecular similarity searching (LINGO) to standard data sets for the purpose of quantifying the accuracy of the approach. Our implementation is based on a pattern-matching finite state machine (FSM) which results in fast search times. The accuracy of LINGO is demonstrated to be comparable to that of a path-based fingerprint and offers a simple yet effective method for similarity searching.

[1]  Andreas Bender,et al.  A Discussion of Measures of Enrichment in Virtual Screening: Comparing the Information Content of Descriptors with Increasing Levels of Sophistication , 2005, J. Chem. Inf. Model..

[2]  Pierre Baldi,et al.  Kernels for small molecules and the prediction of mutagenicity, toxicity and anti-cancer activity , 2005, ISMB.

[3]  Jérôme Hert,et al.  Comparison of Fingerprint-Based Methods for Virtual Screening Using Multiple Bioactive Reference Structures , 2004, J. Chem. Inf. Model..

[4]  U. Lessel,et al.  In vitro and in silico affinity fingerprints: Finding similarities beyond structural classes , 2000 .

[5]  Robert D Clark,et al.  Neighborhood behavior: a useful concept for validation of "molecular diversity" descriptors. , 1996, Journal of medicinal chemistry.

[6]  John M. Barnard,et al.  Chemical Similarity Searching , 1998, J. Chem. Inf. Comput. Sci..

[7]  Thierry Kogej,et al.  Multifingerprint Based Similarity Searches for Targeted Class Compound Selection , 2006, J. Chem. Inf. Model..

[8]  Jérôme Hert,et al.  New Methods for Ligand-Based Virtual Screening: Use of Data Fusion and Machine Learning to Enhance the Effectiveness of Similarity Searching , 2006, J. Chem. Inf. Model..

[9]  P. Willett,et al.  Combination of molecular similarity measures using data fusion , 2000 .

[10]  Tudor I. Oprea,et al.  Is There a Difference Between Leads and Drugs? A Historical Perspective. , 2001 .

[11]  Winston Hide,et al.  Biological Evaluation of d2, an Algorithm for High-Performance Sequence Comparison , 1994, J. Comput. Biol..

[12]  Ramaswamy Nilakantan,et al.  New method for rapid characterization of molecular shapes: applications in drug design , 1993, J. Chem. Inf. Comput. Sci..

[13]  M. Murcko,et al.  Consensus scoring: A method for obtaining improved hit rates from docking databases of three-dimensional structures into proteins. , 1999, Journal of medicinal chemistry.

[14]  David Weininger,et al.  SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules , 1988, J. Chem. Inf. Comput. Sci..

[15]  Robert P Sheridan,et al.  Why do we need so many chemical similarity search methods? , 2002, Drug discovery today.

[16]  David Vidal,et al.  LINGO, an Efficient Holographic Text Based Method To Calculate Biophysical Properties and Intermolecular Similarities , 2005, J. Chem. Inf. Model..

[17]  Irwin D. Kuntz,et al.  A fast and efficient method for 2D and 3D molecular shape description , 1992, J. Comput. Aided Mol. Des..

[18]  David Vidal,et al.  A Novel Search Engine for Virtual Screening of Very Large Databases , 2006, J. Chem. Inf. Model..

[19]  D. Davison,et al.  A measure of DNA sequence dissimilarity based on Mahalanobis distance between frequencies of words. , 1997, Biometrics.

[20]  M. Congreve,et al.  Fragment-based lead discovery , 2004, Nature Reviews Drug Discovery.

[21]  Jonas S. Almeida,et al.  Alignment-free sequence comparison-a review , 2003, Bioinform..

[22]  David Weininger,et al.  SMILES. 2. Algorithm for generation of unique SMILES notation , 1989, J. Chem. Inf. Comput. Sci..

[23]  Peter Willett,et al.  Similarity searching in files of three-dimensional chemical structures: Comparison of fragment-based measures of shape similarity , 1994, J. Chem. Inf. Comput. Sci..