An introduction to statistical issues in High throughput screens

We describe the nature and goals of high-throughput screening experiments, focusing on the challenges they present from the view-point of statistical analysis. We suggest graphical displays to facilitate quality control. We describe sources of systematic variation and methods to correct for it. We consider the problem of ranking compounds with respect to their effects on one cell type and we suggest a couple of procedures, depending on the available number of replicates. Finally, we explore the use of a hierarchical framework for hypothesis testing, to study the effects of compounds in multiple cell lines.

[1]  D. Yekutieli Hierarchical False Discovery Rate–Controlling Methodology , 2008 .

[2]  Robert Nadon,et al.  Statistical practice in high-throughput screening data analysis , 2006, Nature Biotechnology.

[3]  Yoav Benjamini,et al.  Associating quantitative behavioral traits with gene expression in the brain: searching for diamonds in the hay , 2007, Bioinform..

[4]  Yoav Benjamini,et al.  Identifying differentially expressed genes using false discovery rate controlling procedures , 2003, Bioinform..

[5]  Pierre Baldi,et al.  A Bayesian framework for the analysis of microarray expression data: regularized t -test and statistical inferences of gene changes , 2001, Bioinform..

[6]  D. Geschwind,et al.  Thresholding rules for recovering a sparse signal from microarray experiments. , 2002, Mathematical biosciences.

[7]  Chiara Sabatti,et al.  False discovery rate in linkage and association genome screens for complex disorders. , 2003, Genetics.

[8]  Terence P. Speed,et al.  A comparison of normalization methods for high density oligonucleotide array data based on variance and bias , 2003, Bioinform..

[9]  P. vanElteren,et al.  On the combination of independent two sample tests of Wilcoxon : corrected version , 1959 .

[10]  John D. Storey,et al.  Statistical significance for genomewide studies , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[11]  Vladimir Makarenkov,et al.  Using Clustering Techniques to Improve Hit Selection in High-Throughput Screening , 2006, Journal of biomolecular screening.

[12]  B. Efron SIMULTANEOUS INFERENCE : WHEN SHOULD HYPOTHESIS TESTING PROBLEMS BE COMBINED? , 2008, 0803.3863.

[13]  R. Tibshirani,et al.  Significance analysis of microarrays applied to the ionizing radiation response , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[14]  Robert Nadon,et al.  An efficient method for the detection and elimination of systematic error in high-throughput screening , 2007, Bioinform..

[15]  Bert Gunter,et al.  Improved Statistical Methods for Hit Selection in High-Throughput Screening , 2003, Journal of biomolecular screening.

[16]  N. Perrimon,et al.  High-throughput RNAi screening in cultured cells: a user's guide , 2006, Nature Reviews Genetics.

[17]  Robert Nadon,et al.  HTS-Corrector: software for the statistical analysis and correction of experimental high-throughput screening data , 2006, Bioinform..

[18]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[19]  Thomas D. Y. Chung,et al.  A Simple Statistical Parameter for Use in Evaluation and Validation of High Throughput Screening Assays , 1999, Journal of biomolecular screening.

[20]  Paul A Clemons,et al.  The Connectivity Map: Using Gene-Expression Signatures to Connect Small Molecules, Genes, and Disease , 2006, Science.

[21]  V. Makarenkov,et al.  Statistical Analysis of Systematic Errors in High-Throughput Screening , 2005, Journal of biomolecular screening.

[22]  Marc Ferrer,et al.  Robust statistical methods for hit selection in RNA interference high-throughput screening experiments. , 2006, Pharmacogenomics.