论文信息 - Measuring reproducibility of high-throughput experiments - 字舞流文

Measuring reproducibility of high-throughput experiments

Reproducibility is essential to reliable scientific discovery in high-throughput experiments. In this work we propose a unified approach to measure the reproducibility of findings identified from replicate experiments and identify putative discoveries using reproducibility. Unlike the usual scalar measures of reproducibility, our approach creates a curve, which quantitatively assesses when the findings are no longer consistent across replicates. Our curve is fitted by a copula mixture model, from which we derive a quantitative reproducibility score, which we call the "irreproducible discovery rate" (IDR) analogous to the FDR. This score can be computed at each set of paired replicate ranks and permits the principled setting of thresholds both for assessing reproducibility and combining replicates. Since our approach permits an arbitrary scale for each replicate, it provides useful descriptive measures in a wide variety of situations to be explored. We study the performance of the algorithm using simulations and give a heuristic analysis of its theoretical properties. We demonstrate the effectiveness of our method in a ChIP-seq experiment.

Peter J. Bickel | Haiyan Huang | James B. Brown | James B. Brown | Qunhua Li | P. Bickel | Haiyan Huang | Qunhua Li

[1] R. Fisher,et al. Statistical Methods for Research Workers , 1930, Nature.

[2] A. A. Lumsdaine,et al. The American Soldier , 1950 .

[3] M. Sklar. Fonctions de repartition a n dimensions et leurs marges , 1959 .

[4] S. Stouffer. Adjustment during army life , 1977 .

[5] D. Rubin,et al. Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[6] Mitchell J. Mergenthaler. Nonparametrics: Statistical Methods Based on Ranks , 1979 .

[7] N. Fisher,et al. Chi-plots for assessing dependence , 1985 .

[8] G. McLachlan. On Bootstrapping the Likelihood Ratio Test Statistic for the Number of Components in a Normal Mixture , 1987 .

[9] D. Oakes. Multivariate survival distributions , 1994 .

[10] K. Do,et al. Efficient and Adaptive Estimation for Semiparametric Models. , 1994 .

[11] C. Genest,et al. A semiparametric estimation procedure of dependence parameters in multivariate families of distributions , 1995 .

[12] Y. Benjamini,et al. Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[13] H. Joe. Multivariate models and dependence concepts , 1998 .

[14] T. Ledwina,et al. Data-Driven Rank Tests for Independence , 1999 .

[15] G. A. Whitmore,et al. Importance of replication in microarray gene expression studies: statistical methods and evidence from repetitive cDNA hybridizations. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[16] Y. Benjamini,et al. On the Adaptive Control of the False Discovery Rate in Multiple Testing With Independent Statistics , 2000 .

[17] D. Blest. Theory & Methods: Rank Correlation — an Alternative Measure , 2000 .

[18] Nicholas I. Fisher,et al. Statistical Computing and Graphics Graphical Assessment of Dependence: Is a Picture Worth 100 Tests? , 2001 .

[19] John D. Storey. A direct approach to false discovery rates , 2002 .

[20] L. Wasserman,et al. Operating characteristics and extensions of the false discovery rate procedure , 2002 .

[21] B. Efron. Large-Scale Simultaneous Hypothesis Testing , 2004 .

[22] C. Genest,et al. Detecting Dependence With Kendall Plots , 2003 .

[23] C. Genest,et al. On blest's measure of rank correlation , 2003 .

[24] John D. Storey. The positive false discovery rate: a Bayesian interpretation and the q-value , 2003 .

[25] Paul T. Groth,et al. The ENCODE (ENCyclopedia Of DNA Elements) Project , 2004, Science.

[26] L. Wasserman,et al. A stochastic process approach to false discovery control , 2004, math/0406519.

[27] J. Costa,et al. A WEIGHTED RANK MEASURE OF CORRELATION , 2005 .

[28] R. Fisher. Statistical methods for research workers , 1927, Protoplasma.

[29] Bradley Efron,et al. Local False Discovery Rates , 2005 .

[30] Hanlee P. Ji,et al. The MicroArray Quality Control (MAQC) project shows inter- and intraplatform reproducibility of gene expression measurements. , 2006, Nature biotechnology.

[31] Daniel J. Park,et al. A sequence-oriented comparison of gene expression measurements across different hybridization-based technologies , 2006, Nature Biotechnology.

[32] Ling Hu. Dependence patterns across financial markets: a mixed copula approach , 2006 .

[33] Maqc Consortium. The MicroArray Quality Control (MAQC) project shows inter- and intraplatform reproducibility of gene expression measurements , 2006, Nature Biotechnology.

[34] P. S. Witzer. Statistical Computing and Graphics Graphical Assessment of Dependence : Is a Picture Worth 100 Tests ? , 2006 .

[35] Manolis Kellis,et al. Reliable prediction of regulator targets using 12 Drosophila genomes. , 2007, Genome research.

[36] Wenguang Sun,et al. Oracle and Adaptive Compound Decision Rules for False Discovery Rate Control , 2007 .

[37] Terrence S. Furey,et al. F-Seq: a feature density estimator for high-throughput sequence tags , 2008, Bioinform..

[38] R. Myers,et al. An Integrated Software System for Analyzing Chip-chip and Chip-seq Data (supplementary Information) , 2008 .

[39] Clifford A. Meyer,et al. Model-based Analysis of ChIP-Seq (MACS) , 2008, Genome Biology.

[40] Raja Jothi,et al. Genome-wide identification of in vivo protein–DNA binding sites from ChIP-Seq data , 2008, Nucleic acids research.

[41] S. Batzoglou,et al. Genome-Wide Analysis of Transcription Factor Binding Sites Based on ChIP-Seq Data , 2008, Nature Methods.

[42] P. Park,et al. Design and analysis of ChIP-seq experiments for DNA-binding proteins , 2008, Nature Biotechnology.

[43] B. Williams,et al. Mapping and quantifying mammalian transcriptomes by RNA-Seq , 2008, Nature Methods.

[44] Anne-Laure Boulesteix,et al. Stability and aggregation of ranked gene lists , 2009, Briefings Bioinform..

[45] Raymond K. Auerbach,et al. PeakSeq enables systematic scoring of ChIP-seq experiments relative to controls , 2009, Nature Biotechnology.

[46] P. Park. ChIP–seq: advantages and challenges of a maturing technology , 2009, Nature Reviews Genetics.