Surrogate docking: structure-based virtual screening at high throughput speed

SummaryStructure-based screening using fully flexible docking is still too slow for large molecular libraries. High quality docking of a million molecule library can take days even on a cluster with hundreds of CPUs. This performance issue prohibits the use of fully flexible docking in the design of large combinatorial libraries. We have developed a fast structure-based screening method, which utilizes docking of a limited number of compounds to build a 2D QSAR model used to rapidly score the rest of the database. We compare here a model based on radial basis functions and a Bayesian categorization model. The number of compounds that need to be actually docked depends on the number of docking hits found. In our case studies reasonable quality models are built after docking of the number of molecules containing 50 docking hits. The rest of the library is screened by the QSAR model. Optionally a fraction of the QSAR-prioritized library can be docked in order to find the true docking hits. The quality of the model only depends on the training set size – not on the size of the library to be screened. Therefore, for larger libraries the method yields higher gain in speed no change in performance. Prioritizing a large library with these models provides a significant enrichment with docking hits: it attains the values of 13 and 35 at the beginning of the score-sorted libraries in our two case studies: screening of the NCI collection and a combinatorial libraries on CDK2 kinase structure. With such enrichments, only a fraction of the database must actually be docked to find many of the true hits. The throughput of the method allows its use in screening of large compound collections and in the design of large combinatorial libraries. The strategy proposed has an important effect on efficiency but does not affect retrieval of actives, the latter being determined by the quality of the docking method itself.

[1]  A. Ghose,et al.  Atomic Physicochemical Parameters for Three‐Dimensional Structure‐Directed Quantitative Structure‐Activity Relationships I. Partition Coefficients as a Measure of Hydrophobicity , 1986 .

[2]  I D Kuntz,et al.  Structure-based design and combinatorial chemistry yield low nanomolar inhibitors of cathepsin D. , 1997, Chemistry & biology.

[3]  P. Schultz,et al.  Synthesis and application of functionally diverse 2,6,9-trisubstituted purine libraries as CDK inhibitors. , 1999, Chemistry & biology.

[4]  Jürgen Bajorath,et al.  Virtual screening methods that complement HTS. , 2004, Combinatorial chemistry & high throughput screening.

[5]  Ruben Abagyan,et al.  Nuclear hormone receptor targeted virtual screening. , 2003, Journal of medicinal chemistry.

[6]  Todd J. A. Ewing,et al.  DREAM++: Flexible docking program for virtual combinatorial libraries , 1999, J. Comput. Aided Mol. Des..

[7]  Thomas Bäck,et al.  Evolutionary algorithms in theory and practice - evolution strategies, evolutionary programming, genetic algorithms , 1996 .

[8]  Igor Aleksander,et al.  Introduction to Neural Computing , 1990 .

[9]  Christian Lemmen,et al.  Coupling structure-based design with combinatorial chemistry: application of active site derived pharmacophores with informative library design. , 2002, Journal of molecular graphics & modelling.

[10]  Thierry Langer,et al.  Influenza Virus Neuraminidase Inhibitors: Generation and Comparison of Structure-Based and Common Feature Pharmacophore Hypotheses and Their Application in Virtual Screening , 2004, J. Chem. Inf. Model..

[11]  H.‐D. Höltje,et al.  Molekular Orbital Berechnungen zur Struktur des Muscarin‐Pharmakophors , 1974 .

[12]  Didier Rognan,et al.  Comparative evaluation of eight docking tools for docking and virtual screening accuracy , 2004, Proteins.

[13]  Jürgen Bajorath,et al.  Integration of virtual and high-throughput screening , 2002, Nature Reviews Drug Discovery.

[14]  H Fang,et al.  The estrogen receptor relative binding affinities of 188 natural and xenochemicals: structural diversity of ligands. , 2000, Toxicological sciences : an official journal of the Society of Toxicology.

[15]  A. Ghose,et al.  Atomic physicochemical parameters for three dimensional structure directed quantitative structure‐activity relationships III: Modeling hydrophobic interactions , 1988 .

[16]  Dimitris K. Agrafiotis,et al.  Nearest Neighbor Search in General Metric Spaces Using a Tree Data Structure with a Simple Heuristic , 2003, J. Chem. Inf. Comput. Sci..

[17]  Brian Hudson,et al.  Strategic Pooling of Compounds for High-Throughput Screening , 1999, J. Chem. Inf. Comput. Sci..

[18]  Anthony E. Klon,et al.  Finding more needles in the haystack: A simple and efficient method for improving high-throughput docking results. , 2004, Journal of medicinal chemistry.

[19]  Tad Hurst,et al.  Flexible 3D searching: The directed tweak technique , 1994, J. Chem. Inf. Comput. Sci..

[20]  Heikki Mannila,et al.  Principles of Data Mining , 2001, Undergraduate Topics in Computer Science.

[21]  Malin M. Young,et al.  Design, docking, and evaluation of multiple libraries against multiple targets , 2001, Proteins.

[22]  Haruki Nakamura,et al.  A hybrid method of molecular dynamics and harmonic dynamics for docking of flexible ligand to flexible receptor , 2004, J. Comput. Chem..

[23]  T. Bayes An essay towards solving a problem in the doctrine of chances , 2003 .

[24]  Andrew Smellie,et al.  Analysis of Conformational Coverage, 1. Validation and Estimation of Coverage , 1995, Journal of chemical information and computer sciences.

[25]  I D Kuntz,et al.  CombiDOCK: Structure-based combinatorial docking and library design , 1998, Journal of computer-aided molecular design.

[26]  Q Xie,et al.  Structure-activity relationships for a large diverse set of natural, synthetic, and environmental estrogens. , 2001, Chemical research in toxicology.

[27]  Trevor Heritage,et al.  OptiDock: virtual HTS of combinatorial libraries by efficient sampling of binding modes in product space. , 2004, Journal of combinatorial chemistry.

[28]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[29]  P.-C.-F. Daunou,et al.  Mémoire sur les élections au scrutin , 1803 .

[30]  Christopher M. Bishop,et al.  Classification and regression , 1997 .

[31]  Meir Glick,et al.  Application of Machine Learning To Improve the Results of High-Throughput Docking Against the HIV-1 Protease , 2004, J. Chem. Inf. Model..

[32]  P. Charifson,et al.  Improved scoring of ligand-protein interactions using OWFEG free energy grids. , 2001, Journal of medicinal chemistry.

[33]  Andreas Bender,et al.  Molecular Similarity Searching Using Atom Environments, Information-Based Feature Selection, and a Naïve Bayesian Classifier , 2004, J. Chem. Inf. Model..

[34]  Anthony E. Klon,et al.  Combination of a naive Bayes classifier with consensus scoring improves enrichment of high-throughput docking results. , 2004, Journal of medicinal chemistry.

[35]  Darko Butina,et al.  Unsupervised Data Base Clustering Based on Daylight's Fingerprint and Tanimoto Similarity: A Fast and Automated Way To Cluster Small and Large Data Sets , 1999, J. Chem. Inf. Comput. Sci..

[36]  Henrik Boström,et al.  Improving structure-based virtual screening by multivariate analysis of scoring data. , 2003, Journal of medicinal chemistry.

[37]  J M Blaney,et al.  A geometric approach to macromolecule-ligand interactions. , 1982, Journal of molecular biology.