False discovery rate for scanning statistics

The false discovery rate is a criterion for controlling Type I error in simultaneous testing of multiple hypotheses. For scanning statistics, due to local dependence, clusters of neighbouring hypotheses are likely to be rejected together. In such situations, it is more intuitive and informative to group neighbouring rejections together and count them as a single discovery, with the false discovery rate defined as the proportion of clusters that are falsely declared among all declared clusters. Assuming that the number of false discoveries, under this broader definition of a discovery, is approximately Poisson and independent of the number of true discoveries, we examine approaches for estimating and controlling the false discovery rate, and provide examples from biological applications. Copyright 2011, Oxford University Press.

[1]  Louis H. Y. Chen Poisson Approximation for Dependent Trials , 1975 .

[2]  L. Gordon,et al.  Two moments su ce for Poisson approx-imations: the Chen-Stein method , 1989 .

[3]  E. Lander,et al.  Mapping mendelian factors underlying quantitative traits using RFLP linkage maps. , 1989, Genetics.

[4]  Alan C. Evans,et al.  A Three-Dimensional Statistical Analysis for CBF Activation Studies in Human Brain , 1992, Journal of cerebral blood flow and metabolism : official journal of the International Society of Cerebral Blood Flow and Metabolism.

[5]  D. Siegmund,et al.  Testing for a Signal with Unknown Location and Scale in a Stationary Gaussian Random Field , 1995 .

[6]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[7]  John D. Storey,et al.  Strong control, conservative point estimation and simultaneous conservative consistency of false discovery rates: a unified approach , 2004 .

[8]  M. Wigler,et al.  Circular binary segmentation for the analysis of array-based DNA copy number data. , 2004, Biostatistics.

[9]  Peter J. Park,et al.  Comparative analysis of algorithms for identifying amplifications and deletions in array CGH data , 2005, Bioinform..

[10]  Benjamin Yakir,et al.  The Statistics of Gene Mapping , 2007 .

[11]  Yu Zhang,et al.  Poisson approximation for significance in genome-wide ChIP-chip tiling arrays , 2008, Bioinform..

[12]  Joshua M. Korn,et al.  Integrated genotype calling and association analysis of SNPs, common copy number polymorphisms and rare CNVs , 2008, Nature Genetics.

[13]  Paul Medvedev,et al.  Computational methods for discovering structural variation with next-generation sequencing , 2009, Nature Methods.

[14]  Nancy R. Zhang,et al.  Detecting simultaneous changepoints in multiple sequences. , 2010, Biometrika.

[15]  Bradley Efron,et al.  Large-scale inference , 2010 .

[16]  Hongzhe Li,et al.  Optimal Sparse Segment Identification With Application in Copy Number Variation Analysis , 2010, Journal of the American Statistical Association.

[17]  R. Ebstein,et al.  Identification of a functional rare variant in autism using genome-wide screen for monoallelic expression. , 2011, Human molecular genetics.

[18]  Nancy R. Zhang,et al.  Detecting simultaneous variant intervals in aligned sequences , 2011, 1108.3177.