Use of mixture distributions to deconvolute the behavior of "hits" and controls in high-throughput screening data.

The stochastic nature of high-throughput screening (HTS) data indicates that information may be gleaned by applying statistical methods to HTS data. A foundation of parametric statistics is the study and elucidation of population distributions, which can be modeled using modern spreadsheet software. The methods and results described here use fundamental concepts of statistical population distributions analyzed using a spreadsheet to provide tools in a developing armamentarium for extracting information from HTS data. Specific examples using two HTS kinase assays are analyzed. The analyses use normal and gamma distributions, which combine to form mixture distributions. HTS data were found to be described well using such mixture distributions, and deconvolution of the mixtures to the constituent gamma and normal parts provided insight into how the assays performed. In particular, the proportion of hits confirmed was predicted from the original HTS data and used to assess screening assay performance. The analyses also provide a method for determining how hit thresholds--values used to separate active from inactive compounds--affect the proportion of compounds verified as active and how the threshold can be chosen to optimize the selection process.

[1]  Robert Nadon,et al.  Statistical practice in high-throughput screening data analysis , 2006, Nature Biotechnology.

[2]  Bert Gunter,et al.  Statistical and Graphical Methods for Quality Control Determination of High-Throughput Screening Data , 2003, Journal of biomolecular screening.

[3]  Thomas J. Vidmar,et al.  Application of a mixture model for determining the cutoff threshold for activity in high-throughput screening , 2007, Comput. Stat. Data Anal..

[4]  H J Motulsky,et al.  Fitting curves to data using nonlinear regression: a practical and nonmathematical review , 1987, FASEB journal : official publication of the Federation of American Societies for Experimental Biology.

[5]  Stephen Buxser,et al.  Calculating the probability of detection for inhibitors in enzymatic or binding reactions in high-throughput screening. , 2005, Analytical biochemistry.

[6]  D. W. Scott On optimal and data based histograms , 1979 .

[7]  B. Shoichet,et al.  A common mechanism underlying promiscuous inhibitors from virtual and high-throughput screening. , 2002, Journal of medicinal chemistry.

[8]  Andreas Sewing,et al.  Evaluating Real-Life High-Throughput Screening Data , 2005, Journal of biomolecular screening.

[9]  Thomas D. Y. Chung,et al.  A Simple Statistical Parameter for Use in Evaluation and Validation of High Throughput Screening Assays , 1999, Journal of biomolecular screening.

[10]  Robert A Copeland,et al.  Mechanistic considerations in high-throughput screening. , 2003, Analytical biochemistry.

[11]  Tina Garyantes,et al.  The Confirmation Rate of Primary Hits: A Predictive Model , 2002, Journal of biomolecular screening.