Data Shaving: A Focused Screening Approach

The number of compounds available for evaluation as part of the drug discovery process continues to increase. These compounds may exist physically or be stored electronically allowing screening by either actual or virtual means. This growing number of compounds has generated an increasing need for effective strategies to direct screening efforts. Initial efforts toward this goal led to the development of methods to select diverse sets of compounds for screening, methods to cluster actives into related groups of compounds, and tools to select compounds similar to actives of interest for further screening. In this work we extend these earlier efforts to exploit information about inactive compounds to help make rational decisions about which sets of compounds to include as part of a continuing screening campaign, or as part of a focused follow-up effort. This method uses the information from inactive compounds to "shave" off or deprioritize compounds similar to inactives from further consideration. This methodology can be used in two ways: first, to provide a rational means of deciding when sufficient compounds containing certain structural features have been tested and second as a tool to enhance similarity searching around known actives. Similarity searching is improved by deprioritizing compounds predicted to be inactive, due to the presence of structural features associated with inactivity.

[1]  M. S. Lajiness,et al.  Molecular similarity-based methods for selecting compounds for screening , 1990 .

[2]  Ash A. Alizadeh,et al.  'Gene shaving' as a method for identifying distinct sets of genes with similar expression patterns , 2000, Genome Biology.

[3]  Ramaswamy Nilakantan,et al.  A ring-based chemical structural query system: use of a novel ring-complexity heuristic , 1990, J. Chem. Inf. Comput. Sci..

[4]  Thomas R. Hagadone,et al.  Molecular Substructure Similarity Searching: Efficient Retrieval in Two-Dimensional Structure Databases. , 1993 .

[5]  Christos A. Nicolaou,et al.  Analysis of Large Screening Data Sets via Adaptively Grown Phylogenetic-Like Trees , 2002, J. Chem. Inf. Comput. Sci..

[6]  Deqi Chen,et al.  High-throughput virtual screening for drug discovery in parallel. , 2002, Current opinion in drug discovery & development.

[7]  Robert D Clark,et al.  Neighborhood behavior: a useful concept for validation of "molecular diversity" descriptors. , 1996, Journal of medicinal chemistry.

[8]  D. T. Stanton,et al.  Application of Nearest‐Neighbor and Cluster Analyses in Pharmaceutical Lead Discovery. , 1999 .

[9]  Yong-Jin Xu,et al.  Using Molecular Equivalence Numbers to Visually Explore Structural Features that Distinguish Chemical Libraries. , 2002 .

[10]  Chris L. Waller,et al.  Rational Combinatorial Library Design. 3. Simulated Annealing Guided Evaluation (SAGE) of Molecular Diversity: A Novel Computational Tool for Universal Library Design and Database Mining , 1999, J. Chem. Inf. Comput. Sci..

[11]  Ramaswamy Nilakantan,et al.  A Ring-Based Chemical Structural Query System: Use of a Novel Ring-Complexity Heuristic. , 1990 .

[12]  D. Schnur Design and Diversity Analysis of Large Combinatorial Libraries Using Cell‐Based Methods. , 1999 .

[13]  Jürgen Bajorath,et al.  Profile Scaling Increases the Similarity Search Performance of Molecular Fingerprints Containing Numerical Descriptors and Structural Keys , 2003, J. Chem. Inf. Comput. Sci..

[14]  Peter Willett,et al.  Generation and Display of Activity-Weighted Chemical Hyperstructures , 2003, J. Chem. Inf. Comput. Sci..

[15]  Robert S. Pearlman,et al.  Metric Validation and the Receptor-Relevant Subspace Concept , 1999, J. Chem. Inf. Comput. Sci..

[16]  David J. Diller,et al.  Use of Catalyst Pharmacophore Models for Screening of Large Combinatorial Libraries , 2002, J. Chem. Inf. Comput. Sci..

[17]  Peter Willett,et al.  Generation and Display of Activity-Weighted Chemical Hyperstructures. , 2003 .

[18]  Valentin Monev,et al.  Introduction to Similarity Searching in Chemistry , 2005 .

[19]  Yong-Jin Xu,et al.  Algorithm for Naming Molecular Equivalence Classes Represented by Labeled Pseudographs. , 2001 .

[20]  Dimitris K. Agrafiotis,et al.  A Novel Method for Building Regression Tree Models for QSAR Based on Artificial Ant Colony Systems , 2001, J. Chem. Inf. Comput. Sci..

[21]  P. Fernandes Moving Into the Third Millennium After a Century of Screening , 2001 .

[22]  Ramaswamy Nilakantan,et al.  A novel approach to combinatorial library design. , 2002, Combinatorial chemistry & high throughput screening.

[23]  Hugo O. Villar,et al.  Comments on the design of chemical libraries for screening , 2004, Molecular Diversity.

[24]  Herbert S Rosenkranz,et al.  SAR modeling: effect of experimental ambiguity. , 2003, Combinatorial chemistry & high throughput screening.

[25]  Meir Glick,et al.  Pattern recognition and massively distributed computing , 2002, J. Comput. Chem..

[26]  Glenn J. Myatt,et al.  LeadScope: Software for Exploring Large Sets of Screening Data , 2000, J. Chem. Inf. Comput. Sci..

[27]  M. Lajiness Dissimilarity-based compound selection techniques , 1996 .

[28]  Michael Sjöström,et al.  Design of Small Libraries for Lead Exploration , 2002 .

[29]  S A Sundberg,et al.  High-throughput and ultra-high-throughput screening: solution- and cell-based approaches. , 2000, Current opinion in biotechnology.

[30]  Thompson N. Doman,et al.  Algorithm5: A Technique for Fuzzy Similarity Clustering of Chemical Inventories , 1996, J. Chem. Inf. Comput. Sci..

[31]  Christophe G. Lambert,et al.  Analysis of a Large Structure/Biological Activity Data Set Using Recursive Partitioning , 1999, J. Chem. Inf. Comput. Sci..

[32]  A J Hopfinger,et al.  Extraction of pharmacophore information from high-throughput screens. , 2000, Current opinion in biotechnology.

[33]  P Willett,et al.  Visual and computational analysis of structure--activity relationships in high-throughput screening data. , 2001, Current opinion in chemical biology.

[34]  H S Rosenkranz,et al.  Sar modeling of unbalanced data sets , 2001, SAR and QSAR in environmental research.

[35]  G. Klopman Artificial intelligence approach to structure-activity studies. Computer automated structure evaluation of biological activity of organic molecules , 1985 .

[36]  K. M. Smith,et al.  Novel software tools for chemical diversity , 1998 .

[37]  P. Labute,et al.  Binary Quantitative Structure—Activity Relationship (QSAR) Analysis of Estrogen Receptor Ligands. , 1999 .