Chemistry Space Metrics in Diversity Analysis, Library Design, and Compound Selection

DiverseSolutions software was used to generate a “universal” chemistry space that can be used as a standard for profiling most structural sets of interest. A nonlinear method for assigning structures to bins within chemistry space descriptors was developed. This allows the use of chemistry spaces scaled to include all structures within a set, while maintaining a reasonable distribution of structures within bins and providing target percentage cell occupancies. The universal chemistry space and nonlinear binning method were validated using random structures extracted from the Beilstein database. The approach was then used, in conjunction with other diversity analyses, for diverse subset selection and comparison of compound collections.

[1]  D. I. Cooke-Fox,et al.  Computer translation of IUPAC systematic organic chemical nomenclature. 1. Introduction and background to a grammar-based approach , 1989, J. Chem. Inf. Comput. Sci..

[2]  F. Burden Molecular identification number for substructure searches , 1989, J. Chem. Inf. Comput. Sci..

[3]  David Weininger,et al.  SMILES. 2. Algorithm for generation of unique SMILES notation , 1989, J. Chem. Inf. Comput. Sci..

[4]  Jonathan A. Ellman,et al.  Synthesis and Applications of Small Molecule Libraries. , 1996, Chemical reviews.

[5]  Iain M. McLay,et al.  Similarity Measures for Rational Set Selection and Analysis of Combinatorial Libraries: The Diverse Property-Derived (DPD) Approach , 1997, Journal of chemical information and computer sciences.

[6]  Harry P. Schultz,et al.  Topological organic chemistry. 1. Graph theory and topological indices of alkanes , 1989, J. Chem. Inf. Comput. Sci..

[7]  Arthur Dalby,et al.  Description of several chemical structure file formats used by computer programs developed at Molecular Design Limited , 1992, J. Chem. Inf. Comput. Sci..

[8]  Stephen D. Pickett,et al.  Diversity Profiling and Design Using 3D Pharmacophores: Pharmacophore-Derived Queries (PDQ) , 1996, J. Chem. Inf. Comput. Sci..

[9]  Konstantin S. Lebedev,et al.  Computer-aided molecular formula determination from mass, proton and carbon-13 NMR spectra , 1992, J. Chem. Inf. Comput. Sci..

[10]  David Weininger,et al.  SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules , 1988, J. Chem. Inf. Comput. Sci..