Assessing the Scaffold Diversity of Screening Libraries

Medicinal chemists have traditionally realized assessments of chemical diversity and subsequent compound acquisition, although a recent study suggests that experts are usually inconsistent in reviewing large data sets. To analyze the scaffold diversity of commercially available screening collections, we have developed a general workflow aimed at (1) identifying druglike compounds, (2) clustering them by maximum common substructures (scaffolds), (3) measuring the scaffold diversity encoded by each screening collection independently of its size, and finally (4) merging all common substructures in a nonredundant scaffold library that can easily be browsed by structural and topological queries. Starting from 2.4 million compounds out of 12 commercial sources, four categories of libraries could be identified: large- and medium-sized combinatorial libraries (low scaffold diversity), diverse libraries (medium diversity, medium size), and highly diverse libraries (high diversity, low size). The chemical space covered by the scaffold library can be searched to prioritize scaffold-focused libraries.

[1]  Ramaswamy Nilakantan,et al.  Topological torsion: a new molecular descriptor for SAR applications. Comparison with other descriptors , 1987, J. Chem. Inf. Comput. Sci..

[2]  Glenn J. Myatt,et al.  LeadScope: Software for Exploring Large Sets of Screening Data , 2000, J. Chem. Inf. Comput. Sci..

[3]  Miklos Feher,et al.  Property Distributions: Differences between Drugs, Natural Products, and Molecules from Combinatorial Chemistry , 2003, J. Chem. Inf. Comput. Sci..

[4]  Kuo-Chen Chou,et al.  Assessment of chemical libraries for their druggability , 2005, Comput. Biol. Chem..

[5]  David Weininger,et al.  SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules , 1988, J. Chem. Inf. Comput. Sci..

[6]  Y. Martin Diverse viewpoints on computational aspects of molecular diversity. , 2001, Journal of combinatorial chemistry.

[7]  Johannes H. Voigt,et al.  Comparison of the NCI Open Database with Seven Large Chemical Structural Databases , 2001, J. Chem. Inf. Comput. Sci..

[8]  Nicolas Foloppe,et al.  Drug-like Annotation and Duplicate Analysis of a 23-Supplier Chemical Database Totalling 2.7 Million Compounds , 2004, J. Chem. Inf. Model..

[9]  Edgar Jacoby,et al.  Library design for fragment based screening. , 2005, Current topics in medicinal chemistry.

[10]  Paul Watson,et al.  Calculating the knowledge-based similarity of functional groups using crystallographic data , 2001, J. Comput. Aided Mol. Des..

[11]  Harald Mauser,et al.  Database Clustering with a Combination of Fingerprint and Maximum Common Substructure Methods , 2005, J. Chem. Inf. Model..

[12]  Ramaswamy Nilakantan,et al.  A fresh look at pharmaceutical screening library design. , 2003, Drug discovery today.

[13]  Michael M. Hann,et al.  RECAP-Retrosynthetic Combinatorial Analysis Procedure: A Powerful New Technique for Identifying Privileged Molecular Fragments with Useful Applications in Combinatorial Chemistry , 1998, J. Chem. Inf. Comput. Sci..

[14]  T. Webb,et al.  Current directions in the evolution of compound libraries. , 2005, Current opinion in drug discovery & development.

[15]  A. Schuffenhauer,et al.  Complex molecules: do they add value? , 2005, Current opinion in chemical biology.

[16]  D. Agrafiotis,et al.  Combinatorial informatics in the post-genomics era , 2002, Nature Reviews Drug Discovery.

[17]  Kam Y. J. Zhang,et al.  A family of phosphodiesterase inhibitors discovered by cocrystallography and scaffold-based drug design , 2005, Nature Biotechnology.

[18]  Wolfgang H. B. Sauer,et al.  Molecular Shape Diversity of Combinatorial Libraries: A Prerequisite for Broad Bioactivity , 2003, J. Chem. Inf. Comput. Sci..

[19]  Jérôme Hert,et al.  Comparison of Fingerprint-Based Methods for Virtual Screening Using Multiple Bioactive Reference Structures , 2004, J. Chem. Inf. Model..

[20]  Mark Johnson,et al.  Using Molecular Equivalence Numbers To Visually Explore Structural Features that Distinguish Chemical Libraries , 2002, J. Chem. Inf. Comput. Sci..

[21]  David J. Cummins,et al.  Molecular Diversity in Chemical Databases: Comparison of Medicinal Chemistry Knowledge Bases and Databases of Commercially Available Compounds , 1996, J. Chem. Inf. Comput. Sci..

[22]  Brian Dymock,et al.  Design and Characterization of Libraries of Molecular Fragments for Use in NMR Screening against Protein Targets , 2004, J. Chem. Inf. Model..

[23]  Peter Willett,et al.  Hyperstructure model for chemical structure handling: generation and atom-by-atom searching of hyperstructures , 1992, J. Chem. Inf. Comput. Sci..

[24]  Ramaswamy Nilakantan,et al.  Database diversity assessment: New ideas, concepts, and tools , 1997, J. Comput. Aided Mol. Des..

[25]  J. Bajorath,et al.  Distribution of Molecular Scaffolds and R-Groups Isolated from Large Compound Databases , 1999 .

[26]  Malcolm J. McGregor,et al.  Clustering of Large Databases of Compounds: Using the MDL "Keys" as Structural Descriptors , 1997, J. Chem. Inf. Comput. Sci..

[27]  T. Insel,et al.  NIH Molecular Libraries Initiative , 2004, Science.

[28]  Edward R Zartler,et al.  Fragonomics: fragment-based drug discovery. , 2005, Current opinion in chemical biology.

[29]  Didier Rognan,et al.  Design of small-sized libraries by combinatorial assembly of linkers and functional groups to a given scaffold: application to the structure-based optimization of a phosphodiesterase 4 inhibitor. , 2005, Journal of medicinal chemistry.

[30]  Michael M. Hann,et al.  RECAP — Retrosynthetic Combinatorial Analysis Procedure: A Powerful New Technique for Identifying Privileged Molecular Fragments with Useful Applications in Combinatorial Chemistry. , 1998 .

[31]  G. Bemis,et al.  The properties of known drugs. 1. Molecular frameworks. , 1996, Journal of medicinal chemistry.

[32]  W. Patrick Walters,et al.  Filtering databases and chemical libraries , 2004, Molecular Diversity.

[33]  Andrew I Su,et al.  HierS: hierarchical scaffold clustering using topological chemical graphs. , 2005, Journal of medicinal chemistry.

[34]  Peter Willett,et al.  Use of a maximum common subgraph algorithm in the automatic identification of ostensible bond changes occurring in chemical reactions , 1981, J. Chem. Inf. Comput. Sci..

[35]  Scott Boyer,et al.  Chemical and biological profiling of an annotated compound library directed to the nuclear receptor family. , 2005, Current topics in medicinal chemistry.

[36]  Peter Willett,et al.  A Hyperstructure Model for Chemical Structure Handling: Generation and Atom-by-Atom Searching of Hyperstructures. , 1993 .

[37]  Gary Walker,et al.  Enhancing Hit Quality and Diversity within Assay Throughput Constraints , 2005 .

[38]  Gisbert Schneider,et al.  A Hierarchical Clustering Approach for Large Compound Libraries , 2005, J. Chem. Inf. Model..

[39]  Andy de Laet,et al.  Finding drug candidates in virtual and lost/emerging chemistry , 2000 .

[40]  B. Matthews,et al.  Docking molecules by families to increase the diversity of hits in database screens: Computational strategy and experimental evaluation , 2001, Proteins.

[41]  James B. Dunbar,et al.  Enhancing the diversity of a corporate database using chemical database clustering and analysis , 1995, J. Comput. Aided Mol. Des..

[42]  Peter Willett,et al.  An algorithm for chemical superstructure searching , 1985, J. Chem. Inf. Comput. Sci..