Extended-Connectivity Fingerprints

Extended-connectivity fingerprints (ECFPs) are a novel class of topological fingerprints for molecular characterization. Historically, topological fingerprints were developed for substructure and similarity searching. ECFPs were developed specifically for structure-activity modeling. ECFPs are circular fingerprints with a number of useful qualities: they can be very rapidly calculated; they are not predefined and can represent an essentially infinite number of different molecular features (including stereochemical information); their features represent the presence of particular substructures, allowing easier interpretation of analysis results; and the ECFP algorithm can be tailored to generate different types of circular fingerprints, optimized for different uses. While the use of ECFPs has been widely adopted and validated, a description of their implementation has not previously been presented in the literature.

[1]  H. L. Morgan The Generation of a Unique Machine Description for Chemical Structures-A Technique Developed at Chemical Abstracts Service. , 1965 .

[2]  J. E. Dubois,et al.  French National Policy for Chemical Information and the DARC System as a Potential Tool of This Policy , 1973 .

[3]  Roger Attias,et al.  DARC substructure search system: a new approach to chemical information , 1983, J. Chem. Inf. Comput. Sci..

[4]  R. Venkataraghavan,et al.  Atom pairs as molecular features in structure-activity studies: definition and applications , 1985, J. Chem. Inf. Comput. Sci..

[5]  David Weininger,et al.  SMILES. 2. Algorithm for generation of unique SMILES notation , 1989, J. Chem. Inf. Comput. Sci..

[6]  R. Cramer,et al.  Validation of the general purpose tripos 5.2 force field , 1989 .

[7]  R. Sheridan,et al.  3DSEARCH: a system for three-dimensional substructure searching , 1989, J. Chem. Inf. Comput. Sci..

[8]  J N Weinstein,et al.  Neural computing in cancer drug development: predicting mechanism of action. , 1992, Science.

[9]  James G. Nourse,et al.  Structure searching in chemical databases by direct lookup methods , 1993, J. Chem. Inf. Comput. Sci..

[10]  Steven L. Teig,et al.  Chemical Function Queries for 3D Database Search , 1994, J. Chem. Inf. Comput. Sci..

[11]  Jean-Loup Faulon,et al.  Stochastic Generator of Chemical Structure. 1. Application to the Structure Elucidation of Large Molecules , 1994, Journal of chemical information and computer sciences.

[12]  P. Sprague Automated chemical hypothesis generation and database searching with Catalyst , 1995 .

[13]  I. Kuntz,et al.  Molecular similarity based on DOCK-generated fingerprints. , 1996, Journal of medicinal chemistry.

[14]  Malcolm J. McGregor,et al.  Clustering of Large Databases of Compounds: Using the MDL "Keys" as Structural Descriptors , 1997, J. Chem. Inf. Comput. Sci..

[15]  A. Ghose,et al.  Prediction of Hydrophobic (Lipophilic) Properties of Small Organic Molecules Using Fragmental Methods: An Analysis of ALOGP and CLOGP Methods , 1998 .

[16]  David R. Lowis Molecular Hologram QSAR , 1999 .

[17]  U. Lessel,et al.  In vitro and in silico affinity fingerprints: Finding similarities beyond structural classes , 2000 .

[18]  Jean-Loup Faulon,et al.  Developing a methodology for an inverse quantitative structure-activity relationship using the signature molecular descriptor. , 2002, Journal of molecular graphics & modelling.

[19]  James G. Nourse,et al.  Reoptimization of MDL Keys for Use in Drug Discovery , 2002, J. Chem. Inf. Comput. Sci..

[20]  Robert C. Glen,et al.  Novel Methods for the Prediction of logP, pKa, and logD , 2002, J. Chem. Inf. Comput. Sci..

[21]  Jean-Loup Faulon,et al.  The Signature Molecular Descriptor. 1. Using Extended Valence Sequences in QSAR and QSPR Studies , 2003, J. Chem. Inf. Comput. Sci..

[22]  M. Glick,et al.  Prioritization of high throughput screening data of compound mixtures using molecular similarity , 2003 .

[23]  T. Ma,et al.  Nanomolar Affinity Small Molecule Correctors of Defective ΔF508-CFTR Chloride Channel Gating* , 2003, Journal of Biological Chemistry.

[24]  Jean-Loup Faulon,et al.  The Signature Molecular Descriptor. 2. Enumerating Molecules from Their Extended Valence Sequences , 2003, J. Chem. Inf. Comput. Sci..

[25]  Andreas Bender,et al.  Molecular Similarity Searching Using Atom Environments, Information-Based Feature Selection, and a Naïve Bayesian Classifier , 2004, J. Chem. Inf. Model..

[26]  Anthony E. Klon,et al.  Finding more needles in the haystack: A simple and efficient method for improving high-throughput docking results. , 2004, Journal of medicinal chemistry.

[27]  Meir Glick,et al.  Enrichment of Extremely Noisy High-Throughput Screening Data Using a Naïve Bayes Classifier , 2004, Journal of biomolecular screening.

[28]  Xiaoyang Xia,et al.  Classification of kinase inhibitors using a Bayesian model. , 2004, Journal of medicinal chemistry.

[29]  Anthony E. Klon,et al.  Combination of a naive Bayes classifier with consensus scoring improves enrichment of high-throughput docking results. , 2004, Journal of medicinal chemistry.

[30]  S. O'Brien,et al.  Greater than the sum of its parts: combining models for useful ADMET prediction. , 2005, Journal of medicinal chemistry.

[31]  Andrew Smellie,et al.  Surrogate docking: structure-based virtual screening at high throughput speed , 2005, J. Comput. Aided Mol. Des..

[32]  D. Rogers,et al.  Using Extended-Connectivity Fingerprints with Laplacian-Modified Bayesian Analysis in High-Throughput Screening Follow-Up , 2005, Journal of biomolecular screening.

[33]  Anthony E. Klon,et al.  Application of Machine Learning To Improve the Results of High-Throughput Docking Against the HIV-1 Protease , 2004, Journal of Chemical Information and Modeling.

[34]  A. Schuffenhauer,et al.  Complex molecules: do they add value? , 2005, Current opinion in chemical biology.

[35]  Hongmao Sun A naive bayes classifier for prediction of multidrug resistance reversal activity on the basis of atom typing. , 2005, Journal of medicinal chemistry.

[36]  P. Willett,et al.  Enhancing the effectiveness of similarity-based virtual screening using nearest-neighbor information. , 2005, Journal of medicinal chemistry.

[37]  Qiang Zhang,et al.  Scaffold hopping through virtual screening using 2D and 3D similarity descriptors: ranking, voting, and consensus scoring. , 2006, Journal of medicinal chemistry.

[38]  Peter Gedeck,et al.  QSAR - How Good Is It in Practice? Comparison of Descriptor Sets on an Unbiased Cross Section of Corporate Data Sets , 2006, J. Chem. Inf. Model..

[39]  Peter Willett,et al.  Analysis of Data Fusion Methods in Virtual Screening: Similarity and Group Fusion , 2006, J. Chem. Inf. Model..

[40]  Anthony E. Klon,et al.  Improved Naïve Bayesian Modeling of Numerical Data for Absorption, Distribution, Metabolism and Excretion (ADME) Property Prediction , 2006, J. Chem. Inf. Model..

[41]  Meir Glick,et al.  Enrichment of High-Throughput Screening Data with Increasing Levels of Noise Using Support Vector Machines, Recursive Partitioning, and Laplacian-Modified Naive Bayesian Classifiers , 2006, J. Chem. Inf. Model..

[42]  J. Jenkins,et al.  Prediction of Biological Targets for Compounds Using Multiple‐Category Bayesian Models Trained on Chemogenomics Databases. , 2006 .

[43]  Thierry Langer,et al.  Parallel Screening: A Novel Concept in Pharmacophore Modeling and Virtual Screening , 2006, J. Chem. Inf. Model..

[44]  A. Bender,et al.  Circular fingerprints: flexible molecular descriptors with applications from physical chemistry to ADME. , 2006, IDrugs : the investigational drugs journal.

[45]  Hongmao Sun,et al.  An Accurate and Interpretable Bayesian Classification Model for Prediction of hERG Liability , 2006, ChemMedChem.

[46]  David Rogers,et al.  Cheminformatics analysis and learning in a data pipelining environment , 2006, Molecular Diversity.

[47]  Peter Ertl,et al.  Relationships between Molecular Complexity, Biological Activity, and Structural Diversity , 2006, J. Chem. Inf. Model..

[48]  Jérôme Hert,et al.  New Methods for Ligand-Based Virtual Screening: Use of Data Fusion and Machine Learning to Enhance the Effectiveness of Similarity Searching , 2006, J. Chem. Inf. Model..

[49]  Didier Rognan,et al.  Assessing the Scaffold Diversity of Screening Libraries , 2006, J. Chem. Inf. Model..

[50]  Z. Deng,et al.  Bridging chemical and biological space: "target fishing" using 2D and 3D molecular descriptors. , 2006, Journal of medicinal chemistry.

[51]  Andreas Bender,et al.  "Bayes Affinity Fingerprints" Improve Retrieval Rates in Virtual Screening and Define Orthogonal Bioactivity Space: When Are Multitarget Drugs a Feasible Concept? , 2006, J. Chem. Inf. Model..

[52]  Peter Willett,et al.  Similarity Searching in Databases of Flexible 3D Structures Using Autocorrelation Vectors Derived from Smoothed Bounded Distance Matrices , 2006, J. Chem. Inf. Model..

[53]  Thierry Langer,et al.  Parallel Screening and Activity Profiling with HIV Protease Inhibitor Pharmacophore Models , 2007, J. Chem. Inf. Model..

[54]  Pierre Baldi,et al.  Lossless Compression of Chemical Fingerprints Using Integer Entropy Codes Improves Storage and Retrieval , 2007, J. Chem. Inf. Model..

[55]  A. Costache,et al.  AmineDB: Large scale docking of amines with CYP2D6 and scoring for druglike properties—towards defining the scope of the chemical defense against foreign amines in humans , 2007, Xenobiotica; the fate of foreign compounds in biological systems.

[56]  Robert P Sheridan,et al.  Chemical similarity searches: when is complexity justified? , 2007, Expert opinion on drug discovery.

[57]  E. Kellenberger,et al.  Identification of nonpeptide CCR5 receptor agonists by structure-based virtual screening. , 2007, Journal of medicinal chemistry.

[58]  George Papadatos,et al.  Evaluation of machine-learning methods for ligand-based virtual screening , 2007, J. Comput. Aided Mol. Des..

[59]  Simone Sciabola,et al.  Pharmacophoric Fingerprint Method (TOPP) for 3D-QSAR Modeling: Application to CYP2D6 Metabolic Stability , 2007, J. Chem. Inf. Model..

[60]  Berith F. Jensen,et al.  In silico prediction of cytochrome P450 2D6 and 3A4 inhibition using Gaussian kernel weighted k-nearest neighbor and extended connectivity fingerprints, including structural fragment analysis of inhibitors versus noninhibitors. , 2007, Journal of medicinal chemistry.

[61]  Jeremy L. Jenkins,et al.  Clustering and Rule-Based Classifications of Chemical Structures Evaluated in the Biological Activity Space , 2007, J. Chem. Inf. Model..

[62]  Andrew C. Good,et al.  Measuring CAMD Technique Performance, 2. How "Druglike" Are Drugs? Implications of Random Test Set Selection Exemplified Using Druglikeness Classification Models , 2007, J. Chem. Inf. Model..

[63]  R. Zauhar,et al.  Rapid Classification of CYP3A4 Inhibition Potential Using Support Vector Machine Approach , 2007 .

[64]  D. Gehlhaar,et al.  Evaluation of a Published in silico Model and Construction of a Novel Bayesian Model for Predicting Phospholipidosis Inducing Potential. , 2007 .

[65]  Gilles Marcou,et al.  Optimizing Fragment and Scaffold Docking by Use of Molecular Interaction Fingerprints , 2007, J. Chem. Inf. Model..

[66]  Jeffrey R. Huth,et al.  Enhancement of chemical rules for predicting compound reactivity towards protein thiol groups , 2007, J. Comput. Aided Mol. Des..

[67]  Andreas Bender,et al.  A Large Descriptor Set and a Probabilistic Kernel-Based Classifier Significantly Improve Druglikeness Classification , 2007, J. Chem. Inf. Model..

[68]  Flahive Erik Jon,et al.  A High-Throughput Methodology for Screening Solution-Based Chelating Agents for Efficient Palladium Removal , 2007 .

[69]  Jing Lu,et al.  Development of in silico models for human liver microsomal stability , 2007, J. Comput. Aided Mol. Des..

[70]  D. Rognan,et al.  Selective structure-based virtual screening for full and partial agonists of the beta2 adrenergic receptor. , 2008, Journal of medicinal chemistry.

[71]  Andreas Bender,et al.  Ligand-Target Prediction Using Winnow and Naive Bayesian Algorithms and the Implications of Overall Performance Statistics , 2008, J. Chem. Inf. Model..

[72]  George Karypis,et al.  Indirect Similarity Based Methods for Effective Scaffold-Hopping in Chemical Compounds , 2008, J. Chem. Inf. Model..

[73]  Yvonne C. Martin,et al.  Application of Belief Theory to Similarity Data Fusion for Use in Analog Searching and Lead Hopping , 2008, J. Chem. Inf. Model..

[74]  L Martin Cloutier,et al.  Bayesian versus Frequentist statistical modeling: a debate for hit selection from HTS campaigns. , 2008, Drug discovery today.

[75]  Philip Prathipati,et al.  Global Bayesian Models for the Prioritization of Antitubercular Agents , 2008, J. Chem. Inf. Model..

[76]  Kazunari Hattori,et al.  Predicting Key Example Compounds in Competitors′ Patent Applications Using Structural Information Alone. , 2008 .

[77]  Ruifeng Liu,et al.  Scores of Extended Connectivity Fingerprint as Descriptors in QSPR Study of Melting Point and Aqueous Solubility , 2008, J. Chem. Inf. Model..

[78]  Ruifeng Liu,et al.  Using Molecular Fingerprint as Descriptors in the QSPR Study of Lipophilicity , 2008, J. Chem. Inf. Model..

[79]  Tudor I. Oprea,et al.  ChemInform Abstract: Quantifying the Relationships among Drug Classes. , 2008 .

[80]  D. C. Sullivan,et al.  Exploiting Structure—Activity Relationships in Docking. , 2008 .

[81]  Anthony E Klon Bayesian modeling in virtual high throughput screening. , 2009, Combinatorial chemistry & high throughput screening.

[82]  Andreas Bender,et al.  How Similar Are Similarity Searching Methods? A Principal Component Analysis of Molecular Descriptor Space , 2009, J. Chem. Inf. Model..

[83]  Visakan Kadirkamanathan,et al.  Analysis of Neighborhood Behavior in Lead Optimization and Array Design , 2009, J. Chem. Inf. Model..

[84]  Andrew Bell,et al.  Searching Chemical Space with the Bayesian Idea Generator , 2009, J. Chem. Inf. Model..

[85]  George Karypis,et al.  Target Fishing for Chemical Compounds Using Target-Ligand Activity Data and Ranking Based Methods , 2009, J. Chem. Inf. Model..

[86]  M. Milik,et al.  Mapping adverse drug reactions in chemical space. , 2009, Journal of medicinal chemistry.

[87]  Eugen Lounkine,et al.  Relevance of Feature Combinations for Similarity Searching Using General or Activity Class-Directed Molecular Fingerprints , 2009, J. Chem. Inf. Model..

[88]  Jürgen Bajorath,et al.  Development of a Fingerprint Reduction Approach for Bayesian Similarity Searching Based on Kullback-Leibler Divergence Analysis , 2009, J. Chem. Inf. Model..

[89]  Michael J. Keiser,et al.  Predicting new molecular targets for known drugs , 2009, Nature.

[90]  Y. Martin,et al.  Beyond QSAR: Lead Hopping to Different Structures , 2009 .

[91]  T. A. McIntyre,et al.  Prediction of animal clearance using naïve Bayesian classification and extended connectivity fingerprints , 2009, Xenobiotica; the fate of foreign compounds in biological systems.