Applications of random sampling to virtual screening of combinatorial libraries.

We describe statistical techniques for effective evaluation of large virtual combinatorial libraries (> 10(10) potential compounds). The methods described are used for computationally evaluating templates (prioritization of candidate libraries for synthesis and screening) and for the design of individual combinatorial libraries (e.g., for a given diversity site, reagents can be selected based on the estimated frequency with which they appear in products that pass a computational filter). These statistical methods are powerful because they provide a simple way to estimate the properties of the overall library without explicitly enumerating all of the possible products. In addition, they are fast and simple, and the amount of sampling required to achieve a desired precision is calculable. In this article, we discuss the computational methods that allow random product selection from a combinatorial library and the statistics involved in estimating errors from quantities obtained from such samples. We then describe three examples: (1) an estimate of average molecular weight for the several billion possible products in a four-component Ugi reaction, a quantity that can be calculated exactly for comparison; (2) the prioritization of four templates for combinatorial synthesis using a computational filter based on four-point pharmacophores; and (3) selection of reagents for the four-component Ugi reaction based on their frequency of occurrence in products that pass a pharmacophore filter.

[1]  Dimitris K. Agrafiotis,et al.  Stochastic Similarity Selections from Large Combinatorial Libraries , 2000, J. Chem. Inf. Comput. Sci..

[2]  Robert P. Sheridan,et al.  PATTY: A Programmable Atom Typer and Language for Automatic Classification of Atoms in Molecular Databases. , 1994 .

[3]  Dimitris K. Agrafiotis,et al.  Stochastic Similarity Selections from Large Combinatorial Libraries. , 2000 .

[4]  H Brandstetter,et al.  Refined 2.3 A X-ray crystal structure of bovine thrombin complexes formed with the benzamidine and arginine-based thrombin inhibitors NAPAP, 4-TAPAP and MQPA. A starting point for improving antithrombotics. , 1992, Journal of molecular biology.

[5]  John Bradshaw,et al.  The Effectiveness of Reactant Pools for Generating Structurally-Diverse Combinatorial Libraries , 1997, J. Chem. Inf. Comput. Sci..

[6]  D. T. Stanton,et al.  Application of Nearest‐Neighbor and Cluster Analyses in Pharmaceutical Lead Discovery. , 1999 .

[7]  W. Hoeffding Probability Inequalities for sums of Bounded Random Variables , 1963 .

[8]  M. Habib Probabilistic methods for algorithmic discrete mathematics , 1998 .

[9]  James B. Dunbar,et al.  Enhancing the diversity of a corporate database using chemical database clustering and analysis , 1995, J. Comput. Aided Mol. Des..

[10]  John A. Mount,et al.  Estimating the Range of a Function in an Online Setting , 1999, Inf. Process. Lett..

[11]  Johnz Willett Similarity and Clustering in Chemical Information Systems , 1987 .

[12]  J M Blaney,et al.  Computational approaches for combinatorial library design and molecular diversity analysis. , 1997, Current opinion in chemical biology.

[13]  R. Cramer,et al.  Prospective identification of biologically active structures by topomer shape similarity searching. , 1999, Journal of medicinal chemistry.

[14]  Steven L. Teig,et al.  Chemical Function Queries for 3D Database Search , 1994, J. Chem. Inf. Comput. Sci..

[15]  Bruce L. Bush,et al.  PATTY: A programmable atom type and language for automatic classification of atoms in molecular databases , 1993, J. Chem. Inf. Comput. Sci..

[16]  Jennifer L. Miller,et al.  Combinatorial Library Design: Maximizing Model-Fitting Compounds within Matrix Synthesis Constraints , 2000, J. Chem. Inf. Comput. Sci..

[17]  Mark W. Farmen,et al.  Optimum Utilization of a Compound Collection or Chemical Library for Drug Discovery , 1997, J. Chem. Inf. Comput. Sci..

[18]  David C. Spellmeyer,et al.  Chapter 28. Recent Developments in Molecular Diversity: Computational Approaches to Combinatorial Chemistry , 1999 .

[19]  Darren V. S. Green,et al.  Implementation of a System for Reagent Selection and Library Enumeration, Profiling, and Design , 1999, J. Chem. Inf. Comput. Sci..

[20]  Han Van De Waterbeemd Advanced Computer-Assisted Techniques in Drug Discover , 1994 .

[21]  A. Baxter Synthesis utilizing insoluble polymers: new reactions and small molecules. , 1997, Current opinion in chemical biology.

[22]  A. Good,et al.  New methodology for profiling combinatorial libraries and screening sets: cleaning up the design process with HARPick. , 1997, Journal of medicinal chemistry.

[23]  Peter Willett,et al.  Rapid Quantification of Molecular Diversity for Selective Database Acquisition , 1997, J. Chem. Inf. Comput. Sci..

[24]  P. Beroza,et al.  A rapid computational method for lead evolution: description and application to alpha(1)-adrenergic antagonists. , 2000, Journal of medicinal chemistry.

[25]  Malcolm J. McGregor,et al.  Pharmacophore Fingerprinting. 1. Application to QSAR and Focused Library Design , 1999, J. Chem. Inf. Comput. Sci..

[26]  J. Mason,et al.  New 4-point pharmacophore method for molecular similarity and diversity applications: overview of the method and applications, including a novel approach to the design of combinatorial libraries containing privileged substructures. , 1999, Journal of medicinal chemistry.

[27]  Y. Martin,et al.  3D database searching in drug design. , 1992, Journal of medicinal chemistry.

[28]  F. Lombardo,et al.  Experimental and computational approaches to estimate solubility and permeability in drug discovery and development settings. , 2001, Advanced drug delivery reviews.

[29]  I. Ugi,et al.  The Passerini and Ugi Reactions , 1991 .

[30]  Ajay,et al.  Can we learn to distinguish between "drug-like" and "nondrug-like" molecules? , 1998, Journal of medicinal chemistry.

[31]  J. H. Van Drie,et al.  Addressing the Challenges Posed by Combinatorial Chemistry: 3D Databases, Pharmacophore Recognition and Beyond , 1998 .

[32]  David T. Stanton,et al.  Application of Nearest-Neighbor and Cluster Analyses in Pharmaceutical Lead Discovery , 1999, J. Chem. Inf. Comput. Sci..

[33]  F. Lombardo,et al.  Experimental and computational approaches to estimate solubility and permeability in drug discovery and development settings , 1997 .

[34]  H Matter,et al.  Random or rational design? Evaluation of diverse compound subsets from chemical structure databases. , 1998, Journal of medicinal chemistry.