Application of Machine Learning To Improve the Results of High-Throughput Docking Against the HIV-1 Protease

We have previously reported that the application of a Laplacian-modified naive Bayesian (NB) classifier may be used to improve the ranking of known inhibitors from a random database of compounds after High-Throughput Docking (HTD). The method relies upon the frequency of substructural features among the active and inactive compounds from 2D fingerprint information of the compounds. Here we present an investigation of the role of extended connectivity fingerprints in training the NB classifier against HTD studies on the HIV-1 protease using three docking programs: Glide, FlexX, and GOLD. The results show that the performance of the NB classifier is due to the presence of a large number of features common to the set of known active compounds rather than a single structural or substructural scaffold. We demonstrate that the Laplacian-modified naive Bayesian classifier trained with data from high-throughput docking is superior at identifying active compounds from a target database in comparison to conventional two-dimensional substructure search methods alone.

[1]  J. Gasteiger,et al.  ITERATIVE PARTIAL EQUALIZATION OF ORBITAL ELECTRONEGATIVITY – A RAPID ACCESS TO ATOMIC CHARGES , 1980 .

[2]  John W. Raymond,et al.  Conditional Probability: A New Fusion Method for Merging Disparate Virtual Screening Results , 2004, J. Chem. Inf. Model..

[3]  David Weininger,et al.  SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules , 1988, J. Chem. Inf. Comput. Sci..

[4]  Shaomeng Wang,et al.  How Does Consensus Scoring Work for Virtual Library Screening? An Idealized Computer Experiment , 2001, J. Chem. Inf. Comput. Sci..

[5]  Wolfgang Jahnke,et al.  Second-site NMR screening and linker design. , 2003, Current topics in medicinal chemistry.

[6]  P Willett,et al.  Development and validation of a genetic algorithm for flexible docking. , 1997, Journal of molecular biology.

[7]  H. L. Morgan The Generation of a Unique Machine Description for Chemical Structures-A Technique Developed at Chemical Abstracts Service. , 1965 .

[8]  Robert M Stroud,et al.  Anisotropic dynamics of the JE-2147-HIV protease complex: drug resistance and thermodynamic binding mode examined in a 1.09 A structure. , 2002, Biochemistry.

[9]  M. Murcko,et al.  Consensus scoring: A method for obtaining improved hit rates from docking databases of three-dimensional structures into proteins. , 1999, Journal of medicinal chemistry.

[10]  Paul Watson,et al.  Virtual Screening Using Protein-Ligand Docking: Avoiding Artificial Enrichment , 2004, J. Chem. Inf. Model..

[11]  Anthony E. Klon,et al.  Finding more needles in the haystack: A simple and efficient method for improving high-throughput docking results. , 2004, Journal of medicinal chemistry.

[12]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[13]  Meir Glick,et al.  Enrichment of Extremely Noisy High-Throughput Screening Data Using a Naïve Bayes Classifier , 2004, Journal of biomolecular screening.

[14]  R. Clark,et al.  Consensus scoring for ligand/protein interactions. , 2002, Journal of molecular graphics & modelling.

[15]  P. Ettmayer,et al.  3D-quantitative structure-activity relationships of human immunodeficiency virus type-1 proteinase inhibitors: comparative molecular field analysis of 2-heterosubstituted statine derivatives-implications for the design of novel inhibitors. , 1995, Journal of medicinal chemistry.

[16]  Hege S. Beard,et al.  Glide: a new approach for rapid, accurate docking and scoring. 2. Enrichment factors in database screening. , 2004, Journal of medicinal chemistry.

[17]  Richard D. Taylor,et al.  Virtual Screening Using Protein—Ligand Docking: Avoiding Artificial Enrichment. , 2004 .