Analysis of high-throughput screening assays using cluster enrichment.

In this paper, we describe the implementation and evaluation of a cluster-based enrichment strategy to call hits from a high-throughput screen using a typical cell-based assay of 160,000 chemical compounds. Our focus is on statistical properties of the prospective design choices throughout the analysis, including how to choose the number of clusters for optimal power, the choice of test statistic, the significance thresholds for clusters and the activity threshold for candidate hits, how to rank selected hits for carry-forward to the confirmation screen, and how to identify confirmed hits in a data-driven manner. Whereas previously the literature has focused on choice of test statistic or chemical descriptors, our studies suggest that cluster size is the more important design choice. We recommend clusters to be ranked by enrichment odds ratio, not by p-value. Our conceptually simple test statistic is seen to identify the same set of hits as more complex scoring methods proposed in the literature do. We prospectively confirm that such a cluster-based approach can outperform the naive top X approach and estimate that we improved confirmation rates by about 31.5% from 813 using the top X approach to 1187 using our cluster-based method.

[1]  Christian N Parker,et al.  Application of chemoinformatics to high-throughput screening: practical considerations. , 2004, Methods in molecular biology.

[2]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[3]  Z. R. Li,et al.  A support vector machines approach for virtual screening of active compounds of single and multiple mechanisms from large libraries at an improved hit-rate and enrichment factor. , 2008, Journal of molecular graphics & modelling.

[4]  Andrew Smellie,et al.  Visualization and Interpretation of High Content Screening Data , 2006, J. Chem. Inf. Model..

[5]  Jing Li,et al.  Novel Statistical Approach for Primary High-Throughput Screening Hit Selection , 2005, J. Chem. Inf. Model..

[6]  Frank K Brown,et al.  Practical Approaches to Efficient Screening: Information-Rich Screening Protocol , 2004, Journal of biomolecular screening.

[7]  Gary Walker,et al.  Enhancing Hit Quality and Diversity within Assay Throughput Constraints , 2005 .

[8]  Min Xu,et al.  Hit selection with false discovery rate control in genome-scale RNAi screens , 2008, Nucleic acids research.

[9]  Ali S. Hadi,et al.  Finding Groups in Data: An Introduction to Chster Analysis , 1991 .

[10]  Thomas J. Vidmar,et al.  Application of a mixture model for determining the cutoff threshold for activity in high-throughput screening , 2007, Comput. Stat. Data Anal..

[11]  Christophe G. Lambert,et al.  Analysis of a Large Structure/Biological Activity Data Set Using Recursive Partitioning , 1999, J. Chem. Inf. Comput. Sci..

[12]  James E. J. Mills,et al.  Enhanced HTS Hit Selection via a Local Hit Rate Analysis , 2009, J. Chem. Inf. Model..

[13]  Meir Glick,et al.  Enrichment of High-Throughput Screening Data with Increasing Levels of Noise Using Support Vector Machines, Recursive Partitioning, and Laplacian-Modified Naive Bayesian Classifiers , 2006, J. Chem. Inf. Model..

[14]  A. Fliri,et al.  Biological spectra analysis: Linking biological activity profiles to molecular structure. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[15]  Peter Ertl,et al.  Compound Set Enrichment: A Novel Approach to Analysis of Primary HTS Data , 2010, J. Chem. Inf. Model..

[16]  P. Rousseeuw Silhouettes: a graphical aid to the interpretation and validation of cluster analysis , 1987 .

[17]  Zhengrong Zhu,et al.  Review Article: High-Throughput Affinity-Based Technologies for Small-Molecule Drug Discovery , 2009, Journal of biomolecular screening.

[18]  Robert Tibshirani,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition , 2001, Springer Series in Statistics.

[19]  Gavin Harper,et al.  Methods for mining HTS data. , 2006, Drug discovery today.

[20]  R. König,et al.  A probability-based approach for the analysis of large-scale RNAi screens , 2007, Nature Methods.

[21]  F. Leisch,et al.  FlexMix Version 2: Finite Mixtures with Concomitant Variables and Varying and Constant Parameters , 2008 .

[22]  Christian N. Parker,et al.  Application of Chemoinformatics to High-Throughput Screening , 2004 .

[23]  Stuart L. Schreiber,et al.  Identifying Biologically Active Compound Classes Using Phenotypic Screening Data and Sampling Statistics , 2005, J. Chem. Inf. Model..

[24]  G. Bemis,et al.  The properties of known drugs. 1. Molecular frameworks. , 1996, Journal of medicinal chemistry.

[25]  K. Comess,et al.  An Ultraefficient Affinity-Based High-Throughout Screening Process: Application to Bacterial Cell Wall Biosynthesis Enzyme MurF , 2006, Journal of biomolecular screening.

[26]  David Rogers,et al.  Extended-Connectivity Fingerprints , 2010, J. Chem. Inf. Model..

[27]  D. Rogers,et al.  Using Extended-Connectivity Fingerprints with Laplacian-Modified Bayesian Analysis in High-Throughput Screening Follow-Up , 2005, Journal of biomolecular screening.

[28]  Robert Nadon,et al.  Statistical practice in high-throughput screening data analysis , 2006, Nature Biotechnology.