New Methods for Ligand-Based Virtual Screening: Use of Data Fusion and Machine Learning to Enhance the Effectiveness of Similarity Searching

Similarity searching using a single bioactive reference structure is a well-established technique for accessing chemical structure databases. This paper describes two extensions of the basic approach. First, we discuss the use of group fusion to combine the results of similarity searches when multiple reference structures are available. We demonstrate that this technique is notably more effective than conventional similarity searching in scaffold-hopping searches for structurally diverse sets of active molecules; conversely, the technique will do little to improve the search performance if the actives are structurally homogeneous. Second, we make the assumption that the nearest neighbors resulting from a similarity search, using a single bioactive reference structure, are also active and use this assumption to implement approximate forms of group fusion, substructural analysis, and binary kernel discrimination. This approach, called turbo similarity searching, is notably more effective than conventional similarity searching.

[1]  Robert P. Sheridan,et al.  Similarity to Molecules in the Training Set Is a Good Discriminator for Prediction Accuracy in QSAR , 2004, J. Chem. Inf. Model..

[2]  David T. Stanton,et al.  Application of Nearest-Neighbor and Cluster Analyses in Pharmaceutical Lead Discovery , 1999, J. Chem. Inf. Comput. Sci..

[3]  Jérôme Hert,et al.  Comparison of Fingerprint-Based Methods for Virtual Screening Using Multiple Bioactive Reference Structures , 2004, J. Chem. Inf. Model..

[4]  Peter Willett,et al.  Evaluation of Similarity Measures for Searching the Dictionary of Natural Products Database , 2003, J. Chem. Inf. Comput. Sci..

[5]  Stephen E. Robertson,et al.  Relevance weighting of search terms , 1976, J. Am. Soc. Inf. Sci..

[6]  Kitsuchart Pasupa,et al.  Virtual Screening Using Binary Kernel Discrimination: Effect of Noisy Training Data and the Optimization of Performance , 2006, J. Chem. Inf. Model..

[7]  Peter Willett,et al.  Implementation of nearest-neighbor searching in an online chemical structure search system , 1986, J. Chem. Inf. Comput. Sci..

[8]  J. Jenkins,et al.  A 3D similarity method for scaffold hopping from known drugs or natural ligands to new chemotypes. , 2004, Journal of medicinal chemistry.

[9]  Peter Willett,et al.  Comparison of fragment weighting schemes for substructural analysis , 1989 .

[10]  Peter Willett,et al.  Comparison of Ranking Methods for Virtual Screening in Lead-Discovery Programs , 2003, J. Chem. Inf. Comput. Sci..

[11]  P. Willett,et al.  Enhancing the effectiveness of similarity-based virtual screening using nearest-neighbor information. , 2005, Journal of medicinal chemistry.

[12]  Xin Chen,et al.  Performance of Similarity Measures in 2D Fragment-Based Similarity Searching: Comparison of Structural Descriptors and Similarity Coefficients , 2002, J. Chem. Inf. Comput. Sci..

[13]  Michael S Lajiness,et al.  Assessment of the consistency of medicinal chemists in reviewing sets of compounds. , 2004, Journal of medicinal chemistry.

[14]  Yvonne C. Martin,et al.  Use of Structure-Activity Data To Compare Structure-Based Clustering Methods and Descriptors for Use in Compound Selection , 1996, J. Chem. Inf. Comput. Sci..

[15]  Michael K. Gilson,et al.  Virtual Screening of Molecular Databases Using a Support Vector Machine , 2005, J. Chem. Inf. Model..

[16]  P. Willett,et al.  Comparison of topological descriptors for similarity-based virtual screening using multiple bioactive reference structures. , 2004, Organic & biomolecular chemistry.

[17]  Schmid,et al.  "Scaffold-Hopping" by Topological Pharmacophore Search: A Contribution to Virtual Screening. , 1999, Angewandte Chemie.

[18]  John M. Barnard,et al.  Chemical Similarity Searching , 1998, J. Chem. Inf. Comput. Sci..

[19]  Peter Willett,et al.  Enhancing the Effectiveness of Virtual Screening by Fusing Nearest Neighbor Lists: A Comparison of Similarity Coefficients , 2004, J. Chem. Inf. Model..

[20]  Robert P Sheridan,et al.  Why do we need so many chemical similarity search methods? , 2002, Drug discovery today.

[21]  Robert J. Jilek,et al.  "Lead hopping". Validation of topomer similarity as a superior predictor of similar biological activities. , 2004, Journal of medicinal chemistry.

[22]  Hugo Kubinyi,et al.  Similarity and Dissimilarity: A Medicinal Chemist’s View , 2002 .

[23]  R. Venkataraghavan,et al.  Atom pairs as molecular features in structure-activity studies: definition and applications , 1985, J. Chem. Inf. Comput. Sci..

[24]  Peter C Jurs,et al.  Assessing the reliability of a QSAR model's predictions. , 2005, Journal of molecular graphics & modelling.

[25]  P. Willett,et al.  Combination of molecular similarity measures using data fusion , 2000 .

[26]  C E Berkoff,et al.  Substructural analysis. A novel approach to the problem of drug design. , 1974, Journal of medicinal chemistry.

[27]  Yutaka Endo,et al.  Development of a Method for Evaluating Drug-Likeness and Ease of Synthesis Using a Data Set in Which Compounds Are Assigned Scores Based on Chemists' Intuition , 2003, J. Chem. Inf. Comput. Sci..

[28]  Y. Martin,et al.  Do structurally similar molecules have similar biological activity? , 2002, Journal of medicinal chemistry.

[29]  Anthony E. Klon,et al.  Finding more needles in the haystack: A simple and efficient method for improving high-throughput docking results. , 2004, Journal of medicinal chemistry.

[30]  Marvin Johnson,et al.  Concepts and applications of molecular similarity , 1990 .

[31]  M. Murcko,et al.  Consensus scoring: A method for obtaining improved hit rates from docking databases of three-dimensional structures into proteins. , 1999, Journal of medicinal chemistry.

[32]  P Willett,et al.  SLASH: a program for analysing the functional groups in molecules. , 1998, Journal of molecular graphics & modelling.

[33]  Ruedi Stoop,et al.  An Ontology for Pharmaceutical Ligands and Its Application for in Silico Screening and Library Design , 2002, J. Chem. Inf. Comput. Sci..

[34]  Andreas Bender,et al.  Molecular Similarity Searching Using Atom Environments, Information-Based Feature Selection, and a Naïve Bayesian Classifier , 2004, J. Chem. Inf. Model..

[35]  Robert D Clark,et al.  Neighborhood behavior: a useful concept for validation of "molecular diversity" descriptors. , 1996, Journal of medicinal chemistry.

[36]  N. Nikolova,et al.  International Union of Pure and Applied Chemistry, LUMO energy ± The Lowest Unoccupied Molecular Orbital (LUMO) , 2022 .

[37]  R. Glen,et al.  Molecular similarity: a key technique in molecular informatics. , 2004, Organic & biomolecular chemistry.

[38]  M. Stahl,et al.  Scaffold hopping. , 2004, Drug discovery today. Technologies.

[39]  Xiaoyang Xia,et al.  Classification of kinase inhibitors using a Bayesian model. , 2004, Journal of medicinal chemistry.

[40]  Darren V. S. Green,et al.  Prediction of Biological Activity for High-Throughput Screening Using Binary Kernel Discrimination , 2001, J. Chem. Inf. Comput. Sci..

[41]  Peter Willett,et al.  Scaffold Searching: Automated Identification of Similar Ring Systems for the Design of Combinatorial Libraries , 2002 .

[42]  E. LaVoie,et al.  Bioisosterism: A Rational Approach in Drug Design. , 1996, Chemical reviews.