Detecting and overcoming systematic bias in high-throughput screening technologies: a comprehensive review of practical issues and methodological solutions

Significant efforts have been made recently to improve data throughput and data quality in screening technologies related to drug design. The modern pharmaceutical industry relies heavily on high-throughput screening (HTS) and high-content screening (HCS) technologies, which include small molecule, complementary DNA (cDNA) and RNA interference (RNAi) types of screening. Data generated by these screening technologies are subject to several environmental and procedural systematic biases, which introduce errors into the hit identification process. We first review systematic biases typical of HTS and HCS screens. We highlight that study design issues and the way in which data are generated are crucial for providing unbiased screening results. Considering various data sets, including the publicly available ChemBank data, we assess the rates of systematic bias in experimental HTS by using plate-specific and assay-specific error detection tests. We describe main data normalization and correction techniques and introduce a general data preprocessing protocol. This protocol can be recommended for academic and industrial researchers involved in the analysis of current or next-generation HTS data.

[1]  E. Krausz,et al.  Cell-based high-content screening of small-molecule libraries. , 2007, Current opinion in chemical biology.

[2]  S. Walker,et al.  Identification of active-site inhibitors of MurG using a generalizable, high-throughput glycosyltransferase screen. , 2003, Journal of the American Chemical Society.

[3]  Olga Vitek,et al.  Noise reduction in genome-wide perturbation screens using linear mixed-effect models , 2011, Bioinform..

[4]  J. Lazo,et al.  Automated High-Content Live Animal Drug Screening Using C. elegans Expressing the Aggregation Prone Serpin α1-antitrypsin Z , 2010, PloS one.

[5]  Chuanzheng Song,et al.  High quality cDNA libraries for discovery and validation of novel drug targets , 2000 .

[6]  P. Liberali,et al.  Population context determines cell-to-cell variability in endocytosis and virus infection , 2009, Nature.

[7]  Andrew Smellie,et al.  Visualization and Interpretation of High Content Screening Data , 2006, J. Chem. Inf. Model..

[8]  Laurence Lafanechère,et al.  Miniaturization and Validation of a Sensitive Multiparametric Cell-Based Assay for the Concomitant Detection of Microtubule-Destabilizing and Microtubule-Stabilizing Agents , 2006, Journal of biomolecular screening.

[9]  Peter Sommer,et al.  A novel specific edge effect correction method for RNA interference screenings , 2012, Bioinform..

[10]  Norbert Perrimon,et al.  Design and implementation of high-throughput RNAi screens in cultured Drosophila cells , 2007, Nature Protocols.

[11]  David J. Groggel,et al.  Nonparametric Methods for Quantitative Analysis , 1996, Technometrics.

[12]  Rainer Breitling,et al.  Rank products: a simple, yet powerful, new method to detect differentially regulated genes in replicated microarray experiments , 2004, FEBS letters.

[13]  Gary D Bader,et al.  Quantitative analysis of fitness and genetic interactions in yeast on a genome scale , 2010, Nature Methods.

[14]  E. Chiao,et al.  High-throughput functional screen of mouse gastrula cDNA libraries reveals new components of endoderm and mesoderm specification. , 2005, Genome research.

[15]  Aideen Long,et al.  Statistical methods for analysis of high-throughput RNA interference screens , 2009, Nature Methods.

[16]  V. Makarenkov,et al.  Statistical Analysis of Systematic Errors in High-Throughput Screening , 2005, Journal of biomolecular screening.

[17]  Brian Kelley Automated Detection of Systematic Errors in Array Experiments , 2003 .

[18]  Eugen C. Buehler,et al.  siRNA off-target effects in genome-wide screens identify signaling pathway members , 2012, Scientific Reports.

[19]  R. König,et al.  A probability-based approach for the analysis of large-scale RNAi screens , 2007, Nature Methods.

[20]  Susanne Heynen-Genel,et al.  Hybrid median filter background estimator for correcting distortions in microtiter plate data. , 2010, Assay and drug development technologies.

[21]  Thierry Dorval,et al.  HCS-Analyzer: open source software for high-content screening data correction and analysis , 2012, Bioinform..

[22]  Welch Bl THE GENERALIZATION OF ‘STUDENT'S’ PROBLEM WHEN SEVERAL DIFFERENT POPULATION VARLANCES ARE INVOLVED , 1947 .

[23]  R. Nadon,et al.  Control-Plate Regression (CPR) Normalization for High-Throughput Screens with Many Active Features , 2014, Journal of biomolecular screening.

[24]  Roland Eils,et al.  RNAither, an automated pipeline for the statistical analysis of high-throughput RNAi screens , 2009, Bioinform..

[25]  Martin Krzywinski,et al.  Points of Significance: Replication , 2014, Nature Methods.

[26]  Wolfgang Link,et al.  High content screening: seeing is believing. , 2010, Trends in biotechnology.

[27]  Marc Ferrer,et al.  Robust statistical methods for hit selection in RNA interference high-throughput screening experiments. , 2006, Pharmacogenomics.

[28]  George E. P. Box,et al.  Improving Almost Anything: Ideas and Essays , 2006 .

[29]  James A. Koziol,et al.  The rank product method with two samples , 2010, FEBS letters.

[30]  Robert Nadon,et al.  Single assay-wide variance experimental (SAVE) design for high-throughput screening , 2013, Bioinform..

[31]  James W Noah,et al.  New developments and emerging trends in high-throughput screening methods for lead compound identification , 2010 .

[32]  Warren N. Waggenspack,et al.  Introduction—applications , 1996, SIGGRAPH '96.

[33]  M. Braga,et al.  Exploratory Data Analysis , 2018, Encyclopedia of Social Network Analysis and Mining. 2nd Ed..

[34]  Vladimir Makarenkov,et al.  Using Clustering Techniques to Improve Hit Selection in High-Throughput Screening , 2006, Journal of biomolecular screening.

[35]  R. Doerge,et al.  Statistical Design and Analysis of RNA Sequencing Data , 2010, Genetics.

[36]  John S Lazo,et al.  Building a Pharmacological Lexicon: Small Molecule Discovery in Academia , 2007, Molecular Pharmacology.

[37]  Jing Liu,et al.  Experimental Design and Statistical Methods for Improved Hit Detection in High-Throughput Screening , 2010, Journal of biomolecular screening.

[38]  Robert Nadon,et al.  Two effective methods for correcting experimental high-throughput screening data , 2012, Bioinform..

[39]  Robert Nadon,et al.  Systematic error detection in experimental high-throughput screening , 2011, BMC Bioinformatics.

[40]  David M. Simcha,et al.  Tackling the widespread and critical impact of batch effects in high-throughput data , 2010, Nature Reviews Genetics.

[41]  Bert Gunter,et al.  Improved Statistical Methods for Hit Selection in High-Throughput Screening , 2003, Journal of biomolecular screening.

[42]  Nadine H. Elowe,et al.  Experimental Screening of Dihydrofolate Reductase Yields a “Test Set” of 50,000 Small Molecules for a Computational Data-Mining and Docking Competition , 2005, Journal of biomolecular screening.

[43]  Robert Nadon,et al.  Improving Detection of Rare Biological Events in High-Throughput Screens , 2015, Journal of biomolecular screening.

[44]  Karol Kozak,et al.  Data Mining Techniques in High Content Screening: A Survey , 2009 .

[45]  Xiaohua Douglas Zhang,et al.  Integrating Experimental and Analytic Approaches to Improve Data Quality in Genome-wide RNAi Screens , 2008, Journal of biomolecular screening.

[46]  Robert Nadon,et al.  Intensity quantile estimation and mapping - a novel algorithm for the correction of image non-uniformity bias in HCS data , 2012, Bioinform..

[47]  Michael Boutros,et al.  Identification of JAK/STAT signalling components by genome-wide RNA interference , 2005, Nature.

[48]  Dongmei Liu,et al.  Quantitative assessment of hit detection and confirmation in single and duplicate high-throughput screenings. , 2008, Journal of biomolecular screening.

[49]  Karl Rohr,et al.  Normalizing for individual cell population context in the analysis of high-content cellular screens , 2011, BMC Bioinformatics.

[50]  James Inglese,et al.  Assay Development Guidelines for Image-Based High Content Screening, High Content Analysis and High Content Imaging -- Assay Guidance Manual , 2014 .

[51]  Joseph G. Pigeon,et al.  Statistics for Experimenters: Design, Innovation and Discovery , 2006, Technometrics.

[52]  Michael Boutros,et al.  The art and design of genetic screens: RNA interference , 2008, Nature Reviews Genetics.

[53]  Robert Nadon,et al.  Statistical practice in high-throughput screening data analysis , 2006, Nature Biotechnology.

[54]  Anang A Shelat,et al.  The interdependence between screening methods and screening libraries. , 2007, Current opinion in chemical biology.

[55]  D. L. Taylor,et al.  Advances in high content screening for drug discovery. , 2003, Assay and drug development technologies.

[56]  Robert Nadon,et al.  HTS-Corrector: software for the statistical analysis and correction of experimental high-throughput screening data , 2006, Bioinform..

[57]  C. Bakal,et al.  Genomic screening with RNAi: results and challenges. , 2010, Annual review of biochemistry.

[58]  Lars Kaderali,et al.  High-throughput RNA interference screens integrative analysis: Towards a comprehensive understanding of the virus-host interplay. , 2013, World journal of virology.

[59]  Gary A. Churchill,et al.  Importance of randomization in microarray experimental designs with Illumina platforms , 2009, Nucleic acids research.

[60]  T Ochiya,et al.  Atelocollagen-based gene transfer in cells allows high-throughput screening of gene functions. , 2001, Biochemical and biophysical research communications.

[61]  Anjana Rao,et al.  RNAi screening: tips and techniques , 2009, Nature Immunology.

[62]  M. Kendall Statistical Methods for Research Workers , 1937, Nature.

[63]  Norbert Perrimon,et al.  High-throughput RNA interference screens in Drosophila tissue culture cells. , 2005, Methods in enzymology.

[64]  Richard Simon,et al.  A random variance model for detection of differential gene expression in small microarray experiments , 2003, Bioinform..

[65]  Robert Nadon,et al.  An efficient method for the detection and elimination of systematic error in high-throughput screening , 2007, Bioinform..

[66]  Xiaohua Douglas Zhang A pair of new statistical parameters for quality control in RNA interference high-throughput screening assays. , 2007, Genomics.

[67]  Wolfgang Huber,et al.  Analysis of cell-based RNAi screens , 2006, Genome Biology.

[68]  James Inglese,et al.  Reporting data from high-throughput screening of small-molecule libraries. , 2007, Nature chemical biology.

[69]  Stephan Heyse,et al.  Comprehensive analysis of high-throughput screening data , 2002, SPIE BiOS.

[70]  J. R. Koehler,et al.  Modern Applied Statistics with S-Plus. , 1996 .

[71]  B. L. Welch The generalisation of student's problems when several different population variances are involved. , 1947, Biometrika.

[72]  J. Tukey,et al.  An algorithm for the machine calculation of complex Fourier series , 1965 .