Chapter 9 Molecular Similarity: Advances in Methods, Applications and Validations in Virtual Screening and QSAR

Publisher Summary This chapter discusses recent developments in some of the areas that exploit the molecular similarity principle, novel approaches to capture molecular properties by the use of novel descriptors, focuses on a crucial aspect of computational models—their validity, and discusses additional ways to examine data available, such as those from high-throughput screening (HTS) campaigns and to gain more knowledge from this data. The chapter also presents some of the recent applications of methods discussed focusing on the successes of virtual screening applications, database clustering and comparisons (such as drug- and in-house-likeness), and the recent large-scale validations of docking and scoring programs. While a great number of descriptors and modeling methods has been proposed until today, the recent trend toward proper model validation is very much appreciated. Although some of their limitations are surely because of underlying principles and limitations of fundamental concepts, others will certainly be eliminated in the future.

[1]  Harald Mauser,et al.  Database Clustering with a Combination of Fingerprint and Maximum Common Substructure Methods , 2005, J. Chem. Inf. Model..

[2]  Andreas Bender,et al.  Similarity Searching of Chemical Databases Using Atom Environment Descriptors (MOLPRINT 2D): Evaluation of Performance , 2004, J. Chem. Inf. Model..

[3]  A. Tropsha,et al.  Development and validation of k-nearest-neighbor QSPR models of metabolic stability of drug candidates. , 2003, Journal of medicinal chemistry.

[4]  Alexander Golbraikh,et al.  Rational selection of training and test sets for the development of validated QSAR models , 2003, J. Comput. Aided Mol. Des..

[5]  Zheng Rong Yang,et al.  Evaluation of Mutual Information and Genetic Programming for Feature Selection in QSAR , 2004, J. Chem. Inf. Model..

[6]  E. Butcher Can cell systems biology rescue drug discovery? , 2005, Nature Reviews Drug Discovery.

[7]  Ying Liu,et al.  A Comparative Study on Feature Selection Methods for Drug Discovery , 2004, J. Chem. Inf. Model..

[8]  G. Schneider,et al.  Fuzzy pharmacophore models from molecular alignments for correlation-vector-based virtual screening. , 2004, Journal of medicinal chemistry.

[9]  Didier Rognan,et al.  Design of small-sized libraries by combinatorial assembly of linkers and functional groups to a given scaffold: application to the structure-based optimization of a phosphodiesterase 4 inhibitor. , 2005, Journal of medicinal chemistry.

[10]  A. Tropsha,et al.  Beware of q 2 , 2002 .

[11]  Shaomeng Wang,et al.  An Extensive Test of 14 Scoring Functions Using the PDBbind Refined Set of 800 Protein-Ligand Complexes , 2004, J. Chem. Inf. Model..

[12]  Thomas Lengauer,et al.  Multiple-ligand-based virtual screening: methods and applications of the MTree approach. , 2005, Journal of medicinal chemistry.

[13]  R. Glen,et al.  Screening for Dihydrofolate Reductase Inhibitors Using MOLPRINT 2D, a Fast Fragment-Based Method Employing the Naïve Bayesian Classifier: Limitations of the Descriptor and the Importance of Balanced Chemistry in Training and Test Sets , 2005, Journal of biomolecular screening.

[14]  W. Guida,et al.  The art and practice of structure‐based drug design: A molecular modeling perspective , 1996, Medicinal research reviews.

[15]  Jürgen Bajorath,et al.  Evaluating the High-Throughput Screening Computations , 2005, Journal of biomolecular screening.

[16]  Yi-Zeng Liang,et al.  New Approach by Kriging Models to Problems in QSAR , 2004, J. Chem. Inf. Model..

[17]  Paul Watson,et al.  Virtual Screening Using Protein-Ligand Docking: Avoiding Artificial Enrichment , 2004, J. Chem. Inf. Model..

[18]  Matthew P Jacobson,et al.  Virtual Ligand Screening against Escherichia coli Dihydrofolate Reductase: Improving Docking Enrichment Using Physics-Based Methods , 2005, Journal of biomolecular screening.

[19]  D. Rogers,et al.  Using Extended-Connectivity Fingerprints with Laplacian-Modified Bayesian Analysis in High-Throughput Screening Follow-Up , 2005, Journal of biomolecular screening.

[20]  G. Schneider,et al.  Comparison of Three Holographic Fingerprint Descriptors and their Binary Counterparts , 2005 .

[21]  Maria A Miteva,et al.  Fast structure-based virtual ligand screening combining FRED, DOCK, and Surflex. , 2005, Journal of medicinal chemistry.

[22]  Jérôme Hert,et al.  Comparison of Fingerprint-Based Methods for Virtual Screening Using Multiple Bioactive Reference Structures , 2004, J. Chem. Inf. Model..

[23]  Andreas Evers,et al.  Virtual screening of biogenic amine-binding G-protein coupled receptors: comparative evaluation of protein- and ligand-based virtual screening protocols. , 2005, Journal of medicinal chemistry.

[24]  C. E. Peishoff,et al.  A critical assessment of docking programs and scoring functions. , 2006, Journal of medicinal chemistry.

[25]  Andreas Bender,et al.  A Discussion of Measures of Enrichment in Virtual Screening: Comparing the Information Content of Descriptors with Increasing Levels of Sophistication , 2005, J. Chem. Inf. Model..

[26]  Gisbert Schneider,et al.  A Hierarchical Clustering Approach for Large Compound Libraries , 2005, J. Chem. Inf. Model..

[27]  David Vidal,et al.  LINGO, an Efficient Holographic Text Based Method To Calculate Biophysical Properties and Intermolecular Similarities , 2005, J. Chem. Inf. Model..

[28]  Marvin Johnson,et al.  Concepts and applications of molecular similarity , 1990 .

[29]  Alexander Tropsha,et al.  Novel Variable Selection Quantitative Structure-Property Relationship Approach Based on the k-Nearest-Neighbor Principle , 2000, J. Chem. Inf. Comput. Sci..

[30]  Mark T. D. Cronin,et al.  The better predictive model: High q2 for the training set or low root mean square error of prediction for the test set? , 2005 .

[31]  Knut Baumann,et al.  Chance Correlation in Variable Subset Regression: Influence of the Objective Function, the Selection Mechanism, and Ensemble Averaging , 2005 .

[32]  A. Tropsha,et al.  Beware of q2! , 2002, Journal of molecular graphics & modelling.

[33]  Thomas Lengauer,et al.  Ensemble Methods for Classification in Cheminformatics , 2004, J. Chem. Inf. Model..

[34]  Jürgen Bajorath,et al.  POT-DMC: A virtual screening method for the identification of potent hits. , 2004, Journal of medicinal chemistry.

[35]  Peter C Jurs,et al.  Assessing the reliability of a QSAR model's predictions. , 2005, Journal of molecular graphics & modelling.

[36]  Tatsuya Akutsu,et al.  Graph Kernels for Molecular Structure-Activity Relationship Analysis with Support Vector Machines , 2005, J. Chem. Inf. Model..

[37]  A. Fliri,et al.  Biospectra analysis: model proteome characterizations for linking molecular structure and biological response. , 2005, Journal of medicinal chemistry.

[38]  J. Jenkins,et al.  A 3D similarity method for scaffold hopping from known drugs or natural ligands to new chemotypes. , 2004, Journal of medicinal chemistry.

[39]  F. Burden,et al.  Robust QSAR models using Bayesian regularized neural networks. , 1999, Journal of medicinal chemistry.

[40]  G. Klebe,et al.  Successful virtual screening for a submicromolar antagonist of the neurokinin-1 receptor based on a ligand-supported homology model. , 2004, Journal of medicinal chemistry.

[41]  S. L. Dixon,et al.  One-dimensional molecular representations and similarity calculations: methodology and validation. , 2001, Journal of medicinal chemistry.

[42]  John J Irwin,et al.  Here Be Dragons: Docking and Screening in an Uncharted Region of Chemical Space , 2005, Journal of biomolecular screening.

[43]  Y. Kurogi,et al.  Pharmacophore modeling and three-dimensional database searching for drug design using catalyst. , 2001, Current medicinal chemistry.

[44]  Ferran Sanz,et al.  Anchor-GRIND: filling the gap between standard 3D QSAR and the GRid-INdependent descriptors. , 2005, Journal of medicinal chemistry.

[45]  John M. Barnard,et al.  Chemical Similarity Searching , 1998, J. Chem. Inf. Comput. Sci..

[46]  Peter Willett,et al.  Enhancing the Effectiveness of Virtual Screening by Fusing Nearest Neighbor Lists: A Comparison of Similarity Coefficients , 2004, J. Chem. Inf. Model..

[47]  Rajarshi Guha,et al.  Determining the Validity of a QSAR Model - A Classification Approach , 2005, J. Chem. Inf. Model..

[48]  Ling Yang,et al.  An in silico approach for screening flavonoids as P-glycoprotein inhibitors based on a Bayesian-regularized neural network , 2005, J. Comput. Aided Mol. Des..

[49]  Peter Gedeck,et al.  Use of the R-group descriptor for alignment-free QSAR , 2005 .

[50]  David A Winkler,et al.  Predictive Bayesian neural network models of MHC class II peptide binding. , 2005, Journal of molecular graphics & modelling.

[51]  John B. O. Mitchell,et al.  Predicting protein-ligand binding affinities: a low scoring game? , 2004, Organic & biomolecular chemistry.

[52]  E. Jaeger,et al.  Comparison of automated docking programs as virtual screening tools. , 2005, Journal of Medicinal Chemistry.

[53]  N. Nikolova,et al.  International Union of Pure and Applied Chemistry, LUMO energy ± The Lowest Unoccupied Molecular Orbital (LUMO) , 2022 .

[54]  Peter Tiño,et al.  Nonlinear Prediction of Quantitative Structure-Activity Relationships , 2004, J. Chem. Inf. Model..

[55]  David J Diller,et al.  Fast small molecule similarity searching with multiple alignment profiles of molecules represented in one-dimension. , 2005, Journal of medicinal chemistry.

[56]  Gunnar Rätsch,et al.  Classifying 'Drug-likeness' with Kernel-Based Learning Methods , 2005, J. Chem. Inf. Model..

[57]  Jing Li,et al.  Novel Statistical Approach for Primary High-Throughput Screening Hit Selection , 2005, J. Chem. Inf. Model..

[58]  Robert P. Sheridan,et al.  Random Forest: A Classification and Regression Tool for Compound Classification and QSAR Modeling , 2003, J. Chem. Inf. Comput. Sci..

[59]  Ismael Zamora,et al.  Virtual Screening and Scaffold Hopping Based on GRID Molecular Interaction Fields , 2005, J. Chem. Inf. Model..

[60]  R. Glen,et al.  Molecular similarity: a key technique in molecular informatics. , 2004, Organic & biomolecular chemistry.

[61]  Douglas M. Hawkins,et al.  The Problem of Overfitting , 2004, J. Chem. Inf. Model..

[62]  Sudhir A. Kulkarni,et al.  Three-Dimensional QSAR Using the k-Nearest Neighbor Method and Its Interpretation , 2006, J. Chem. Inf. Model..

[63]  Iain M. McLay,et al.  Retrospective Docking Study of PDE4B Ligands and an Analysis of the Behavior of Selected Scoring Functions , 2005, J. Chem. Inf. Model..

[64]  Michael K. Gilson,et al.  Virtual Screening of Molecular Databases Using a Support Vector Machine , 2005, J. Chem. Inf. Model..

[65]  H. Kubinyi,et al.  A scoring scheme for discriminating between drugs and nondrugs. , 1998, Journal of medicinal chemistry.

[66]  David W Salt,et al.  Judging the significance of multiple linear regression models. , 2005, Journal of medicinal chemistry.

[67]  Stuart L. Schreiber,et al.  Identifying Biologically Active Compound Classes Using Phenotypic Screening Data and Sampling Statistics , 2005, J. Chem. Inf. Model..

[68]  Mahdi Mahfouf,et al.  Clustering Files of Chemical Structures Using the Fuzzy k-Means Clustering Method , 2004, J. Chem. Inf. Model..

[69]  Jonathan D Hirst,et al.  Application of non-parametric regression to quantitative structure-activity relationships. , 2002, Bioorganic & medicinal chemistry.

[70]  Jens Sadowski,et al.  Comparison of Support Vector Machine and Artificial Neural Network Systems for Drug/Nondrug Classification , 2003, J. Chem. Inf. Comput. Sci..

[71]  Andrew I Su,et al.  HierS: hierarchical scaffold clustering using topological chemical graphs. , 2005, Journal of medicinal chemistry.

[72]  Janet M Thornton,et al.  Ligand selectivity and competition between enzymes in silico , 2004, Nature Biotechnology.

[73]  Ting Wang,et al.  Boosting: An Ensemble Learning Tool for Compound Classification and QSAR Modeling , 2005, J. Chem. Inf. Model..

[74]  Jens Sadowski,et al.  "In-House Likeness": Comparison of Large Compound Collections Using Artificial Neural Networks , 2005, J. Chem. Inf. Model..

[75]  Nadine H. Elowe,et al.  Experimental Screening of Dihydrofolate Reductase Yields a “Test Set” of 50,000 Small Molecules for a Computational Data-Mining and Docking Competition , 2005, Journal of biomolecular screening.

[76]  C. Springer,et al.  PostDOCK: a structural, empirical approach to scoring protein ligand complexes. , 2005, Journal of medicinal chemistry.

[77]  Darren V. S. Green,et al.  The Reduced Graph Descriptor in Virtual Screening and Data-Driven Clustering of High-Throughput Screening Data , 2005, J. Chem. Inf. Model..

[78]  Gang Chen,et al.  A New Rapid and Effective Chemistry Space Filter in Recognizing a Druglike Database , 2005, J. Chem. Inf. Model..

[79]  Joanna Jaworska,et al.  Improving Opportunities for Regulatory Acceptance of QSARs: The Importance of Model Domain, Uncertainty, Validity and Predictability , 2003 .

[80]  Jiahua Wu,et al.  Extracting the three-dimensional shape of live pigs using stereo photogrammetry , 2004 .

[81]  Thierry Langer,et al.  Discovery of high-affinity ligands of σ1 receptor, ERG2, and emopamil binding protein by pharmacophore modeling and virtual screening , 2005 .

[82]  Thierry Langer,et al.  Discovery of high-affinity ligands of sigma1 receptor, ERG2, and emopamil binding protein by pharmacophore modeling and virtual screening. , 2005, Journal of medicinal chemistry.

[83]  Hao Chen,et al.  Virtual Screening of Novel Noncovalent Inhibitors for SARS-CoV 3C-like Proteinase , 2005, J. Chem. Inf. Model..

[84]  U. Lessel,et al.  In vitro and in silico affinity fingerprints: Finding similarities beyond structural classes , 2000 .

[85]  Christian N Parker,et al.  McMaster University Data-Mining and Docking Competition , 2005, Journal of biomolecular screening.

[86]  M. Murcko,et al.  Consensus scoring: A method for obtaining improved hit rates from docking databases of three-dimensional structures into proteins. , 1999, Journal of medicinal chemistry.

[87]  E. Brown,et al.  High throughput screening identifies novel inhibitors of Escherichia coli dihydrofolate reductase that are competitive with dihydrofolate. , 2003, Bioorganic & medicinal chemistry letters.

[88]  Johann Gasteiger,et al.  Fingal: A Novel Approach to Geometric Fingerprinting and a Comparative Study of Its Application to 3D‐QSAR Modelling , 2005 .

[89]  Robert P. Sheridan,et al.  Similarity to Molecules in the Training Set Is a Good Discriminator for Prediction Accuracy in QSAR , 2004, J. Chem. Inf. Model..

[90]  P. Willett,et al.  Enhancing the effectiveness of similarity-based virtual screening using nearest-neighbor information. , 2005, Journal of medicinal chemistry.

[91]  Chong Hak Chae,et al.  Novel Receptor Surface Approach for 3D-QSAR: The Weighted Probe Interaction Energy Method , 2004, J. Chem. Inf. Model..

[92]  M. Glick,et al.  Prioritization of high throughput screening data of compound mixtures using molecular similarity , 2003 .

[93]  Shaomeng Wang,et al.  How Does Consensus Scoring Work for Virtual Library Screening? An Idealized Computer Experiment , 2001, J. Chem. Inf. Comput. Sci..

[94]  S. Pickett,et al.  GRid-INdependent descriptors (GRIND): a novel class of alignment-independent three-dimensional molecular descriptors. , 2000, Journal of medicinal chemistry.

[95]  Miklos Feher,et al.  The Use of Consensus Scoring in Ligand-Based Virtual Screening , 2006, J. Chem. Inf. Model..

[96]  G. Schneider,et al.  Extraction and visualization of potential pharmacophore points using support vector machines: application to ligand-based virtual screening for COX-2 inhibitors. , 2005, Journal of medicinal chemistry.

[97]  P. Goodford A computational procedure for determining energetically favorable binding sites on biologically important macromolecules. , 1985, Journal of medicinal chemistry.

[98]  Jürgen Bajorath,et al.  Combinatorial Preferences Affect Molecular Similarity/Diversity Calculations Using Binary Fingerprints and Tanimoto Coefficients , 2000, J. Chem. Inf. Comput. Sci..

[99]  Ferran Sanz,et al.  Incorporating molecular shape into the alignment-free Grid-Independent Descriptors. , 2004, Journal of medicinal chemistry.

[100]  Petra Schneider,et al.  Comparison of correlation vector methods for ligand-based similarity searching , 2003, J. Comput. Aided Mol. Des..

[101]  J. G. Vinter,et al.  Scaffold hopping with molecular field points: identification of a cholecystokinin-2 (CCK2) receptor pharmacophore and its use in the design of a prototypical series of pyrrole- and imidazole-based CCK2 antagonists. , 2005, Journal of medicinal chemistry.

[102]  Douglas M. Hawkins,et al.  Assessing Model Fit by Cross-Validation , 2003, J. Chem. Inf. Comput. Sci..

[103]  David A. Cosgrove,et al.  Lead Hopping Using SVM and 3D Pharmacophore Fingerprints , 2005, J. Chem. Inf. Model..

[104]  Naomie Salim,et al.  Analysis and Display of the Size Dependence of Chemical Similarity Coefficients , 2003, J. Chem. Inf. Comput. Sci..

[105]  J D Hirst,et al.  Nonlinear quantitative structure-activity relationship for the inhibition of dihydrofolate reductase by pyrimidines. , 1996, Journal of medicinal chemistry.

[106]  Anthony E. Klon,et al.  Combination of a naive Bayes classifier with consensus scoring improves enrichment of high-throughput docking results. , 2004, Journal of medicinal chemistry.

[107]  Peter Gedeck,et al.  Calculation of Intersubstituent Similarity Using R-Group Descriptors , 2003, J. Chem. Inf. Comput. Sci..

[108]  J. Topliss,et al.  Chance factors in studies of quantitative structure-activity relationships. , 1979, Journal of medicinal chemistry.

[109]  J. Wendoloski,et al.  Identification of compounds with nanomolar binding affinity for checkpoint kinase-1 using knowledge-based virtual screening. , 2004, Journal of medicinal chemistry.

[110]  Anthony E. Klon,et al.  Finding more needles in the haystack: A simple and efficient method for improving high-throughput docking results. , 2004, Journal of medicinal chemistry.

[111]  J. Andrew Grant,et al.  Small Molecule Shape-Fingerprints , 2005, J. Chem. Inf. Model..

[112]  Andreas Bender,et al.  Molecular Similarity Searching Using Atom Environments, Information-Based Feature Selection, and a Naïve Bayesian Classifier , 2004, J. Chem. Inf. Model..

[113]  D. Frank Hsu,et al.  Consensus Scoring Criteria for Improving Enrichment in Virtual Screening , 2005, J. Chem. Inf. Model..

[114]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[115]  Gergana Dimitrova,et al.  A Stepwise Approach for Defining the Applicability Domain of SAR and QSAR Models , 2005, J. Chem. Inf. Model..