Analysis and comparison of 2D fingerprints: insights into database screening performance using eight fingerprint methods

Virtual screening is a widely used strategy in modern drug discovery and 2D fingerprint similarity is an important tool that has been successfully applied to retrieve active compounds from large datasets. However, it is not always straightforward to select an appropriate fingerprint method and associated settings for a given problem. Here, we applied eight different fingerprint methods, as implemented in the new cheminformatics package Canvas, on a well-validated dataset covering five targets. The fingerprint methods include Linear, Dendritic, Radial, MACCS, MOLPRINT2D, Pairwise, Triplet, and Torsion. We find that most fingerprints have similar retrieval rates on average; however, each has special characteristics that distinguish its performance on different query molecules and ligand sets. For example, some fingerprints exhibit a significant ligand size dependency whereas others are more robust with respect to variations in the query or active compounds. In cases where little information is known about the active ligands, MOLPRINT2D fingerprints produce the highest average retrieval actives. When multiple queries are available, we find that a fingerprint averaged over all query molecules is generally superior to fingerprints derived from single queries. Finally, a complementarity metric is proposed to determine which fingerprint methods can be combined to improve screening results.

[1]  C. John Blankley,et al.  Comparison of 2D Fingerprint Types and Hierarchy Level Selection Methods for Structural Grouping Using Ward's Clustering , 2000, J. Chem. Inf. Comput. Sci..

[2]  Osman F. Güner,et al.  Use of flexible queries for searching conformationally flexible molecules in databases of three-dimensional structures , 1992, J. Chem. Inf. Comput. Sci..

[3]  Andreas Evers,et al.  Virtual screening of biogenic amine-binding G-protein coupled receptors: comparative evaluation of protein- and ligand-based virtual screening protocols. , 2005, Journal of medicinal chemistry.

[4]  David Weininger,et al.  Stigmata: An Algorithm To Determine Structural Commonalities in Diverse Datasets , 1996, J. Chem. Inf. Comput. Sci..

[5]  S. L. Dixon,et al.  The hidden component of size in two-dimensional fragment descriptors: side effects on sampling in bioactive libraries. , 1999, Journal of medicinal chemistry.

[6]  D. Rogers,et al.  Using Extended-Connectivity Fingerprints with Laplacian-Modified Bayesian Analysis in High-Throughput Screening Follow-Up , 2005, Journal of biomolecular screening.

[7]  S. L. Dixon,et al.  One-dimensional molecular representations and similarity calculations: methodology and validation. , 2001, Journal of medicinal chemistry.

[8]  Osman F. Güner,et al.  Pharmacophore perception, development, and use in drug design , 2000 .

[9]  Hans Briem,et al.  Flexsim-X: A Method for the Detection of Molecules with Similar Biological Activity , 2000, J. Chem. Inf. Comput. Sci..

[10]  Andreas Bender,et al.  Similarity Searching of Chemical Databases Using Atom Environment Descriptors (MOLPRINT 2D): Evaluation of Performance , 2004, J. Chem. Inf. Model..

[11]  Matthias Rarey,et al.  Feature trees: A new molecular similarity measure based on tree matching , 1998, J. Comput. Aided Mol. Des..

[12]  Xin Chen,et al.  Performance of Similarity Measures in 2D Fragment-Based Similarity Searching: Comparison of Structural Descriptors and Similarity Coefficients , 2002, J. Chem. Inf. Comput. Sci..

[13]  J M Blaney,et al.  A geometric approach to macromolecule-ligand interactions. , 1982, Journal of molecular biology.

[14]  Mark A. Murcko,et al.  Virtual screening : an overview , 1998 .

[15]  H. Matter,et al.  Selecting optimally diverse compounds from structure databases: a validation study of two-dimensional and three-dimensional molecular descriptors. , 1997, Journal of medicinal chemistry.

[16]  Jérôme Hert,et al.  Comparison of Fingerprint-Based Methods for Virtual Screening Using Multiple Bioactive Reference Structures , 2004, J. Chem. Inf. Model..

[17]  P. Willett,et al.  Comparison of topological descriptors for similarity-based virtual screening using multiple bioactive reference structures. , 2004, Organic & biomolecular chemistry.

[18]  Ajay,et al.  Can we learn to distinguish between "drug-like" and "nondrug-like" molecules? , 1998, Journal of medicinal chemistry.

[19]  David Rogers,et al.  Extended-Connectivity Fingerprints , 2010, J. Chem. Inf. Model..

[20]  L. Hall,et al.  Molecular Structure Description: The Electrotopological State , 1999 .

[21]  Woody Sherman,et al.  Large-Scale Systematic Analysis of 2D Fingerprint Methods and Parameters to Improve Virtual Screening Enrichments , 2010, J. Chem. Inf. Model..

[22]  Matthias Rarey,et al.  Similarity searching in large combinatorial chemistry spaces , 2001, J. Comput. Aided Mol. Des..

[23]  Andreas Bender,et al.  Molecular Similarity Searching Using Atom Environments, Information-Based Feature Selection, and a Naïve Bayesian Classifier , 2004, J. Chem. Inf. Model..

[24]  David Vidal,et al.  A Novel Search Engine for Virtual Screening of Very Large Databases , 2006, J. Chem. Inf. Model..

[25]  H. L. Morgan The Generation of a Unique Machine Description for Chemical Structures-A Technique Developed at Chemical Abstracts Service. , 1965 .

[26]  Miklos Feher,et al.  The Use of Consensus Scoring in Ligand-Based Virtual Screening , 2006, J. Chem. Inf. Model..

[27]  Ramaswamy Nilakantan,et al.  Topological torsion: a new molecular descriptor for SAR applications. Comparison with other descriptors , 1987, J. Chem. Inf. Comput. Sci..

[28]  D. Frank Hsu,et al.  Consensus Scoring Criteria for Improving Enrichment in Virtual Screening , 2005, J. Chem. Inf. Model..

[29]  R. Venkataraghavan,et al.  Atom pairs as molecular features in structure-activity studies: definition and applications , 1985, J. Chem. Inf. Comput. Sci..

[30]  J. A. Grant,et al.  Gaussian docking functions. , 2003, Biopolymers.

[31]  M. Lajiness Dissimilarity-based compound selection techniques , 1996 .

[32]  Miklos Feher,et al.  Consensus scoring for protein-ligand interactions. , 2006, Drug discovery today.

[33]  Darren R. Flower,et al.  On the Properties of Bit String-Based Measures of Chemical Similarity , 1998, J. Chem. Inf. Comput. Sci..

[34]  Andreas Bender,et al.  How Similar Are Similarity Searching Methods? A Principal Component Analysis of Molecular Descriptor Space , 2009, J. Chem. Inf. Model..

[35]  Steven L. Teig,et al.  Chemical Function Queries for 3D Database Search , 1994, J. Chem. Inf. Comput. Sci..

[36]  G. Schneider,et al.  Virtual Screening for Bioactive Molecules , 2000 .

[37]  U. Lessel,et al.  In vitro and in silico affinity fingerprints: Finding similarities beyond structural classes , 2000 .

[38]  Brian K. Shoichet,et al.  Virtual Screening in Drug Discovery , 2005 .

[39]  Roger A. Sayle,et al.  Lingos, Finite State Machines, and Fast Similarity Searching , 2006, J. Chem. Inf. Model..

[40]  W. Graham Richards,et al.  Similarity of molecular shape , 1991, J. Comput. Aided Mol. Des..

[41]  Robert P. Sheridan,et al.  Comparison of Topological, Shape, and Docking Methods in Virtual Screening , 2007, J. Chem. Inf. Model..

[42]  David Weininger,et al.  SMILES. 2. Algorithm for generation of unique SMILES notation , 1989, J. Chem. Inf. Comput. Sci..

[43]  Gareth Jones,et al.  Pharmacophoric pattern matching in files of three-dimensional chemical structures: Comparison of conformational-searching algorithms for flexible searching , 1994, J. Chem. Inf. Comput. Sci..

[44]  Qiang Zhang,et al.  Scaffold hopping through virtual screening using 2D and 3D similarity descriptors: ranking, voting, and consensus scoring. , 2006, Journal of medicinal chemistry.

[45]  Yvonne C. Martin,et al.  Use of Structure-Activity Data To Compare Structure-Based Clustering Methods and Descriptors for Use in Compound Selection , 1996, J. Chem. Inf. Comput. Sci..

[46]  Matthew P. Repasky,et al.  Glide: a new approach for rapid, accurate docking and scoring. 1. Method and assessment of docking accuracy. , 2004, Journal of medicinal chemistry.

[47]  R. Cramer,et al.  Prospective identification of biologically active structures by topomer shape similarity searching. , 1999, Journal of medicinal chemistry.

[48]  Thomas Lengauer,et al.  A fast flexible docking method using an incremental construction algorithm. , 1996, Journal of molecular biology.

[49]  Jürgen Bajorath,et al.  Fingerprint Scaling Increases the Probability of Identifying Molecules with Similar Activity in Virtual Screening Calculations , 2001, J. Chem. Inf. Comput. Sci..

[50]  W. Graham Richards,et al.  Ultrafast shape recognition to search compound databases for similar molecular shapes , 2007, J. Comput. Chem..

[51]  I. Kuntz,et al.  Molecular similarity based on DOCK-generated fingerprints. , 1996, Journal of medicinal chemistry.

[52]  Peter Willett,et al.  Similarity-based virtual screening using 2D fingerprints. , 2006, Drug discovery today.

[53]  A. S. Braverman,et al.  Progress in Molecular and Subcellular Biology 3 , 1973, Progress in Molecular and Subcellular Biology.

[54]  Andrea Zaliani,et al.  FTree query construction for virtual screening: a statistical analysis , 2008, J. Comput. Aided Mol. Des..

[55]  Reiji Teramoto,et al.  Consensus Scoring with Feature Selection for Structure-Based Virtual Screening , 2008, J. Chem. Inf. Model..

[56]  Robert P Sheridan,et al.  Why do we need so many chemical similarity search methods? , 2002, Drug discovery today.

[57]  Yvonne C. Martin,et al.  The Information Content of 2D and 3D Structural Descriptors Relevant to Ligand-Receptor Binding , 1997, J. Chem. Inf. Comput. Sci..

[58]  Robert P. Sheridan,et al.  FLOG: A system to select ‘quasi-flexible’ ligands complementary to a receptor of known three-dimensional structure , 1994, J. Comput. Aided Mol. Des..

[59]  Robert C. Glen,et al.  Similarity Metrics and Descriptor Spaces – Which Combinations to Choose? , 2006 .

[60]  Malcolm J. McGregor,et al.  Pharmacophore Fingerprinting. 1. Application to QSAR and Focused Library Design , 1999, J. Chem. Inf. Comput. Sci..

[61]  Christian Lemmen,et al.  A Novel Shape-Feature Based Approach to Virtual Library Screening , 2002, J. Chem. Inf. Comput. Sci..