Uniform Coverage Designs for Molecule Selection

In screening for drug discovery, chemists often select a large subset of molecules from a very large database (e.g., select 1,000 molecules from 100,000). To generate diverse leads for drug optimization, highly active compounds in several structurally different chemical classes are sought. Molecules can be characterized by numerical descriptors, and the chosen subset should cover the descriptor space or subspaces formed by several descriptors. We propose a method that concentrates on low-dimensional subspaces, a criterion for uniformity of coverage, and a fast exchange algorithm to optimize the criterion. These methods are illustrated by using a National Cancer Institute database.

[1]  G. Klopman Artificial intelligence approach to structure-activity studies. Computer automated structure evaluation of biological activity of organic molecules , 1985 .

[2]  D J Gans,et al.  On the significance of clusters in the graphical display of structure-activity data. , 1986, Journal of medicinal chemistry.

[3]  K. M. Smith,et al.  Novel software tools for chemical diversity , 1998 .

[4]  Yuan Wang,et al.  Some Applications of Number-Theoretic Methods in Statistics , 1994 .

[5]  R. D. Cook,et al.  A Comparison of Algorithms for Constructing Exact D-Optimal Designs , 1980 .

[6]  Colin L. Mallows,et al.  Factor-covering designs for testing software , 1998 .

[7]  Jonathan S. Mason,et al.  Chemistry Space Metrics in Diversity Analysis, Library Design, and Compound Selection , 1998, J. Chem. Inf. Comput. Sci..

[8]  Sidney Addelman,et al.  trans-Dimethanolbis(1,1,1-trifluoro-5,5-dimethylhexane-2,4-dionato)zinc(II) , 2008, Acta crystallographica. Section E, Structure reports online.

[9]  Robert S. Pearlman,et al.  Metric Validation and the Receptor-Relevant Subspace Concept , 1999, J. Chem. Inf. Comput. Sci..

[10]  M. E. Johnson,et al.  Minimax and maximin distance designs , 1990 .

[11]  Boxin Tang Orthogonal Array-Based Latin Hypercubes , 1993 .

[12]  Peter J. Zemroch,et al.  Cluster Analysis as an Experimental Design Generator, With Application to Gasoline Blend ing Experiments , 1986 .

[13]  Ian A. Watson,et al.  Experimental Designs for Selecting Molecules from Large Chemical Databases , 1997, J. Chem. Inf. Comput. Sci..

[14]  Richard A. Lewis,et al.  Drug design by machine learning: the use of inductive logic programming to model the structure-activity relationships of trimethoprim analogues binding to dihydrofolate reductase. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[15]  H. Wynn Results in the Theory and Construction of D‐Optimum Experimental Designs , 1972 .

[16]  David H. Doehlert,et al.  Uniform Shell Designs , 1970 .

[17]  David J. Cummins,et al.  Molecular Diversity in Chemical Databases: Comparison of Medicinal Chemistry Knowledge Bases and Databases of Commercially Available Compounds , 1996, J. Chem. Inf. Comput. Sci..

[18]  D. Hawkins,et al.  Analysis of a Large Structure‐Activity Data Set Using Recursive Partitioning , 1997 .

[19]  F. Burden Molecular identification number for substructure searches , 1989, J. Chem. Inf. Comput. Sci..