An Automatic Quality Control Pipeline for High-Throughput Screening Hit Identification

The correction or removal of signal errors in high-throughput screening (HTS) data is critical to the identification of high-quality lead candidates. Although a number of strategies have been previously developed to correct systematic errors and to remove screening artifacts, they are not universally effective and still require fair amount of human intervention. We introduce a fully automated quality control (QC) pipeline that can correct generic interplate systematic errors and remove intraplate random artifacts. The new pipeline was first applied to ~100 large-scale historical HTS assays; in silico analysis showed auto-QC led to a noticeably stronger structure-activity relationship. The method was further tested in several independent HTS runs, where QC results were sampled for experimental validation. Significantly increased hit confirmation rates were obtained after the QC steps, confirming that the proposed method was effective in enriching true-positive hits. An implementation of the algorithm is available to the screening community.

[1]  Hanspeter Gubler,et al.  Methods for Statistical Analysis, Quality Assurance and Management of Primary High‐throughput Screening Data , 2006 .

[2]  Robert Nadon,et al.  An efficient method for the detection and elimination of systematic error in high-throughput screening , 2007, Bioinform..

[3]  Bert Gunter,et al.  Statistical and Graphical Methods for Quality Control Determination of High-Throughput Screening Data , 2003, Journal of biomolecular screening.

[4]  Thomas D. Y. Chung,et al.  A Simple Statistical Parameter for Use in Evaluation and Validation of High Throughput Screening Assays , 1999, Journal of biomolecular screening.

[5]  V. Makarenkov,et al.  Statistical Analysis of Systematic Errors in High-Throughput Screening , 2005, Journal of biomolecular screening.

[6]  Robert Nadon,et al.  Systematic error detection in experimental high-throughput screening , 2011, BMC Bioinformatics.

[7]  Hélène Decornez,et al.  Early phase drug discovery: cheminformatics and computational techniques in identifying lead series. , 2012, Bioorganic & medicinal chemistry.

[8]  John W. Tukey,et al.  Exploratory Data Analysis. , 1979 .

[9]  Lin Gao,et al.  Introducing Bayesian Thinking to High-Throughput Screening for False-Negative Rate Estimation , 2013, Journal of biomolecular screening.

[10]  Steven J. M. Jones,et al.  Circos: an information aesthetic for comparative genomics. , 2009, Genome research.

[11]  Lorenz M Mayr,et al.  Novel trends in high-throughput screening. , 2009, Current opinion in pharmacology.

[12]  Stefan Wiemann,et al.  Genome-wide RNAi screening identifies human proteins with a regulatory function in the early secretory pathway , 2012, Nature Cell Biology.

[13]  Jing Li,et al.  Novel Statistical Approach for Primary High-Throughput Screening Hit Selection , 2005, J. Chem. Inf. Model..

[14]  Yun He,et al.  Learning from the Data: Mining of Large High-Throughput Screening Databases , 2006, J. Chem. Inf. Model..

[15]  R. König,et al.  A probability-based approach for the analysis of large-scale RNAi screens , 2007, Nature Methods.

[16]  Bert Gunter,et al.  Improved Statistical Methods for Hit Selection in High-Throughput Screening , 2003, Journal of biomolecular screening.

[17]  Robert Nadon,et al.  Statistical practice in high-throughput screening data analysis , 2006, Nature Biotechnology.