How similar are those molecules after all? Use two descriptors and you will have three different answers

Importance of the field: Molecular similarity searching (ligand-based virtual screening) is one of the routine computational techniques used in drug discovery and pharmacological research. However, while a large number of descriptors exist, there are no general guidelines whatsoever which descriptors work better and which descriptors should be used in the different cases. Areas covered in this review: This review provides a brief overview of current molecular descriptors and databases used for their evaluation, followed by a critical discussion of their differences. What the reader will gain: After reading this review, the reader will be aware of how very differently molecular descriptors assess similarities of molecules, and the performance that can be realistically expected from them. Take home message: Molecular descriptors come in a variety of forms, and they show vast differences in assessing the similarity between molecules. Virtual screening performance of many descriptors is often lower than expected, compared to ‘dumb’ descriptors while some simple methods such as circular fingerprints offer surprisingly good performance in many cases. The choice of the right benchmark library is crucial, many of which are summarized in this review.

[1]  Nam Doo Kim,et al.  Pharmacophore-based virtual screening: a review of recent applications , 2010, Expert opinion on drug discovery.

[2]  Michael F. Lynch,et al.  Analysis of Structural Characteristics of Chemical Compounds in the Common Data Base , 1973 .

[3]  Anshuman Dixit,et al.  Computer-Aided Drug Design: Integration of Structure-Based and Ligand-Based Approaches in Drug Design , 2007 .

[4]  R. Glen,et al.  Molecular similarity: a key technique in molecular informatics. , 2004, Organic & biomolecular chemistry.

[5]  Robert P Sheridan,et al.  Why do we need so many chemical similarity search methods? , 2002, Drug discovery today.

[6]  Robert P. Sheridan,et al.  Comparison of Topological, Shape, and Docking Methods in Virtual Screening , 2007, J. Chem. Inf. Model..

[7]  Naomie Salim,et al.  Similarity‐Based Virtual Screening with a Bayesian Inference Network , 2009, ChemMedChem.

[8]  Hanna Geppert,et al.  Current Trends in Ligand-Based Virtual Screening: Molecular Representations, Data Mining Methods, New Application Areas, and Performance Evaluation , 2010, J. Chem. Inf. Model..

[9]  Andrew C. Good,et al.  Measuring CAMD technique performance: A virtual screening case study in the design of validation experiments , 2004, J. Comput. Aided Mol. Des..

[10]  José L. Medina-Franco,et al.  Scaffold Diversity Analysis of Compound Data Sets Using an Entropy-Based Measure , 2009 .

[11]  A. Bender,et al.  Assessment of structural diversity in combinatorial synthesis. , 2005, Current opinion in chemical biology.

[12]  Andreas Bender,et al.  Similarity Searching of Chemical Databases Using Atom Environment Descriptors (MOLPRINT 2D): Evaluation of Performance , 2004, J. Chem. Inf. Model..

[13]  Jérôme Hert,et al.  Comparison of Fingerprint-Based Methods for Virtual Screening Using Multiple Bioactive Reference Structures , 2004, J. Chem. Inf. Model..

[14]  Eugen Lounkine,et al.  Improving the Search Performance of Extended Connectivity Fingerprints through Activity‐Oriented Feature Filtering and Application of a Bit‐Density‐Dependent Similarity Function , 2009, ChemMedChem.

[15]  Dudley H Williams,et al.  Noncovalent interactions: defining cooperativity. Ligand binding aided by reduced dynamic behavior of receptors. Binding of bacterial cell wall analogues to ristocetin A. , 2004, Journal of the American Chemical Society.

[16]  W. Graham Richards,et al.  Ultrafast shape recognition to search compound databases for similar molecular shapes , 2007, J. Comput. Chem..

[17]  Andreas Bender,et al.  Prospective Validation of a Comprehensive In silico hERG Model and its Applications to Commercial Compound and Drug Databases , 2010, ChemMedChem.

[18]  John D. Holliday,et al.  The effect of structural redundancy in validation sets on virtual screening performance , 2009 .

[19]  R. Glen,et al.  Screening for Dihydrofolate Reductase Inhibitors Using MOLPRINT 2D, a Fast Fragment-Based Method Employing the Naïve Bayesian Classifier: Limitations of the Descriptor and the Importance of Balanced Chemistry in Training and Test Sets , 2005, Journal of biomolecular screening.

[20]  Sebastian G. Rohrer,et al.  Maximum Unbiased Validation (MUV) Data Sets for Virtual Screening Based on PubChem Bioactivity Data , 2009, J. Chem. Inf. Model..

[21]  Roberto Todeschini,et al.  Handbook of Molecular Descriptors , 2002 .

[22]  Thomas Sander,et al.  Comparison of Ligand- and Structure-Based Virtual Screening on the DUD Data Set , 2009, J. Chem. Inf. Model..

[23]  G. Bemis,et al.  The properties of known drugs. 1. Molecular frameworks. , 1996, Journal of medicinal chemistry.

[24]  Walter Filgueira de Azevedo,et al.  Drug-binding databases. , 2008, Current drug targets.

[25]  I. Kuntz,et al.  Molecular similarity based on DOCK-generated fingerprints. , 1996, Journal of medicinal chemistry.

[26]  Michael F. Lynch,et al.  Analysis of structural characteristics of chemical compounds in a large computer-based file. Part II. Atom-centred fragments , 1970 .

[27]  Pekka Tiikkainen,et al.  Critical Comparison of Virtual Screening Methods against the MUV Data Set , 2009, J. Chem. Inf. Model..

[28]  P. Willett,et al.  Comparison of topological descriptors for similarity-based virtual screening using multiple bioactive reference structures. , 2004, Organic & biomolecular chemistry.

[29]  Michael J. Sorich,et al.  Comparison Data Sets for Benchmarking QSAR Methodologies in Lead Optimization , 2009, J. Chem. Inf. Model..

[30]  Markus H. J. Seifert,et al.  Essential factors for successful virtual screening. , 2008, Mini reviews in medicinal chemistry.

[31]  Jürgen Bajorath,et al.  Molecular Fingerprint Recombination: Generating Hybrid Fingerprints for Similarity Searching from Different Fingerprint Types , 2009, ChemMedChem.

[32]  Dennis M. Krüger,et al.  Comparison of Structure‐ and Ligand‐Based Virtual Screening Protocols Considering Hit List Complementarity and Enrichment Factors , 2010, ChemMedChem.

[33]  Andreas Bender,et al.  A Discussion of Measures of Enrichment in Virtual Screening: Comparing the Information Content of Descriptors with Increasing Levels of Sophistication , 2005, J. Chem. Inf. Model..

[34]  Andreas Bender,et al.  Diversity-oriented synthesis; a spectrum of approaches and results. , 2008, Organic & biomolecular chemistry.

[35]  Andreas Bender,et al.  Molecular Similarity Searching Using Atom Environments, Information-Based Feature Selection, and a Naïve Bayesian Classifier , 2004, J. Chem. Inf. Model..

[36]  Andreas Bender,et al.  How Similar Are Similarity Searching Methods? A Principal Component Analysis of Molecular Descriptor Space , 2009, J. Chem. Inf. Model..

[37]  Andreas Bender,et al.  Databases: Compound bioactivities go public , 2010 .

[38]  M. Congreve,et al.  Fragment-based lead discovery , 2004, Nature Reviews Drug Discovery.

[39]  Jeffrey S. Albert,et al.  Fragment‐Based Lead Discovery , 2010 .

[40]  Naomie Salim,et al.  Combination of Fingerprint-Based Similarity Coefficients Using Data Fusion , 2003, J. Chem. Inf. Comput. Sci..

[41]  Herbert Köppen Virtual screening - what does it give us? , 2009, Current opinion in drug discovery & development.

[42]  Woody Sherman,et al.  Large-Scale Systematic Analysis of 2D Fingerprint Methods and Parameters to Improve Virtual Screening Enrichments , 2010, J. Chem. Inf. Model..

[43]  Andreas Bender,et al.  Alpha Shapes Applied to Molecular Shape Characterization Exhibit Novel Properties Compared to Established Shape Descriptors , 2009, J. Chem. Inf. Model..

[44]  J. Irwin,et al.  Benchmarking sets for molecular docking. , 2006, Journal of medicinal chemistry.

[45]  Anthony E Klon Bayesian modeling in virtual high throughput screening. , 2009, Combinatorial chemistry & high throughput screening.

[46]  D. E. Clark What has virtual screening ever done for drug discovery? , 2008, Expert opinion on drug discovery.

[47]  David Rogers,et al.  Extended-Connectivity Fingerprints , 2010, J. Chem. Inf. Model..

[48]  Christian N Parker,et al.  McMaster University Data-Mining and Docking Competition , 2005, Journal of biomolecular screening.