Hole filling and library optimization: application to commercially available fragment libraries.

Compound libraries comprise an integral component of drug discovery in the pharmaceutical and biotechnology industries. While in-house libraries often contain millions of molecules, this number pales in comparison to the accessible space of drug-like molecules. Therefore, care must be taken when adding new compounds to an existing library in order to ensure that unexplored regions in the chemical space are filled efficiently while not needlessly increasing the library size. In this work, we present an automated method to fill holes in an existing library using compounds from an external source and apply it to commercially available fragment libraries. The method, called Canvas HF, uses distances computed from 2D chemical fingerprints and selects compounds that fill vacuous regions while not suffering from the problem of selecting only compounds at the edge of the chemical space. We show that the method is robust with respect to different databases and the number of requested compounds to retrieve. We also present an extension of the method where chemical properties can be considered simultaneously with the selection process to bias the compounds toward a desired property space without imposing hard property cutoffs. We compare the results of Canvas HF to those obtained with a standard sphere exclusion method and with random compound selection and find that Canvas HF performs favorably. Overall, the method presented here offers an efficient and effective hole-filling strategy to augment compound libraries with compounds from external sources. The method does not have any fit parameters and therefore it should be applicable in most hole-filling applications.

[1]  J. Broach,et al.  High-throughput screening for drug discovery. , 1996, Nature.

[2]  Peter J. Fleming,et al.  Combinatorial Library Design Using a Multiobjective Genetic Algorithm , 2002, J. Chem. Inf. Comput. Sci..

[3]  M. Congreve,et al.  Fragment-based lead discovery , 2004, Nature Reviews Drug Discovery.

[4]  P Willett,et al.  Comparison of algorithms for dissimilarity-based compound selection. , 1997, Journal of molecular graphics & modelling.

[5]  Steven L. Dixon,et al.  Bioactive Diversity and Screening Library Selection via Affinity Fingerprinting , 1998, J. Chem. Inf. Comput. Sci..

[6]  J. Bajorath,et al.  Docking and scoring in virtual screening for drug discovery: methods and applications , 2004, Nature Reviews Drug Discovery.

[7]  Jürgen Bajorath,et al.  Integration of virtual and high-throughput screening , 2002, Nature Reviews Drug Discovery.

[8]  C. Murray,et al.  The rise of fragment-based drug discovery. , 2009, Nature chemistry.

[9]  Brian K. Shoichet,et al.  ZINC - A Free Database of Commercially Available Compounds for Virtual Screening , 2005, J. Chem. Inf. Model..

[10]  Andreas Bender,et al.  How Similar Are Similarity Searching Methods? A Principal Component Analysis of Molecular Descriptor Space , 2009, J. Chem. Inf. Model..

[11]  Herbert Waldmann,et al.  From protein domains to drug candidates-natural products as guiding principles in the design and synthesis of compound libraries. , 2002, Angewandte Chemie.

[12]  Panu Somervuo,et al.  Self-organizing maps of symbol strings , 1998, Neurocomputing.

[13]  Robert D. Clark,et al.  OptiSim: An Extended Dissimilarity Selection Method for Finding Diverse Representative Subsets , 1997, J. Chem. Inf. Comput. Sci..

[14]  Dimitris K. Agrafiotis Multiobjective optimization of combinatorial libraries , 2001, IBM J. Res. Dev..

[15]  Gisbert Schneider,et al.  Evaluation of Distance Metrics for Ligand‐Based Similarity Searching , 2004, Chembiochem : a European journal of chemical biology.

[16]  G. Bemis,et al.  The properties of known drugs. 1. Molecular frameworks. , 1996, Journal of medicinal chemistry.

[17]  B. Tidor,et al.  Rational Approaches to Improving Selectivity in Drug Design , 2012, Journal of medicinal chemistry.

[18]  Woody Sherman,et al.  Rapid Shape-Based Ligand Alignment and Virtual Screening Method Based on Atom/Feature-Pair Similarities and Volume Overlap Scoring , 2011, J. Chem. Inf. Model..

[19]  Woody Sherman,et al.  Analysis and comparison of 2D fingerprints: insights into database screening performance using eight fingerprint methods , 2010, J. Cheminformatics.

[20]  Woody Sherman,et al.  Computational approaches for fragment-based and de novo design. , 2010, Current topics in medicinal chemistry.

[21]  Jürgen Bajorath,et al.  Comparison of 2D Fingerprint Methods for Multiple‐Template Similarity Searching on Compound Activity Classes of Increasing Structural Diversity , 2007, ChemMedChem.

[22]  Woody Sherman,et al.  Large-Scale Systematic Analysis of 2D Fingerprint Methods and Parameters to Improve Virtual Screening Enrichments , 2010, J. Chem. Inf. Model..

[23]  Man-Ling Lee,et al.  DISE: Directed Sphere Exclusion , 2003, J. Chem. Inf. Comput. Sci..

[24]  Herbert Waldmann,et al.  From protein domains to drug candidates – natural products as guiding principles in , 2002 .

[25]  M. J. Gardner,et al.  COMBINATORIAL SYNTHESIS : THE DESIGN OF COMPOUND LIBRARIES AND THEIR APPLICATION TO DRUG DISCOVERY , 1995 .

[26]  Roger E. Critchlow,et al.  Beyond mere diversity: tailoring combinatorial libraries for drug discovery. , 1999, Journal of combinatorial chemistry.

[27]  S. P. Lloyd,et al.  Least squares quantization in PCM , 1982, IEEE Trans. Inf. Theory.