Testing and estimation for clustered signals

We propose a change-point detection method for large scale multiple testing problems with data having clustered signals. Unlike the classic change-point setup, the signals can vary in size within a cluster. The clustering structure on the signals enables us to effectively delineate the boundaries between signal and non-signal segments. New test statistics are proposed for observations from one and/or multiple realizations. Their asymptotic distributions are derived. We also study the associated variance estimation problem. We allow the variances to be heteroscedastic in the multiple realization case, which substantially expands the applicability of the proposed method. Simulation studies demonstrate that the proposed approach has a favorable performance. Our procedure is applied to an array based Comparative Genomic Hybridization (aCGH) dataset.

[1]  Wenguang Sun,et al.  Simultaneous Testing of Grouped Hypotheses: Finding Needles in Multiple Haystacks , 2009 .

[2]  Jianqing Fan,et al.  To How Many Simultaneous Hypothesis Tests Can Normal, Student's t or Bootstrap Calibration Be Applied? , 2006, math/0701003.

[3]  Yi-Ching Yao Estimating the number of change-points via Schwarz' criterion , 1988 .

[4]  L. Wasserman,et al.  False discovery control with p-value weighting , 2006 .

[5]  Zhou Zhou,et al.  Gradient-based structural change detection for nonstationary time series M-estimation , 2018, The Annals of Statistics.

[6]  Hongyuan Cao,et al.  Changepoint estimation: another look at multiple testing problems , 2015 .

[7]  William Fithian,et al.  AdaPT: an interactive procedure for multiple testing with side information , 2016, Journal of the Royal Statistical Society: Series B (Statistical Methodology).

[8]  Xianyang Zhang,et al.  OPTIMAL FALSE DISCOVERY RATE CONTROL FOR LARGE SCALE MULTIPLE TESTING WITH AUXILIARY INFORMATION. , 2021, Annals of statistics.

[9]  T. Lai,et al.  Stochastic segmentation models for array-based comparative genomic hybridization data analysis. , 2008, Biostatistics.

[10]  Michael R Kosorok,et al.  Simultaneous Critical Values For T-Tests In Very High Dimensions. , 2011, Bernoulli : official journal of the Bernoulli Society for Mathematical Statistics and Probability.

[11]  Lilun Du,et al.  Single-index modulated multiple testing , 2014, 1407.0185.

[12]  M. Stratton,et al.  The cancer genome , 2009, Nature.

[13]  Tao Yu,et al.  MULTIPLE TESTING VIA FDRL FOR LARGE SCALE IMAGING DATA , 2011 .

[14]  Xiang Zhou,et al.  Polygenic Modeling with Bayesian Sparse Linear Mixed Models , 2012, PLoS genetics.

[15]  Y. Benjamini,et al.  False Discovery Rates for Spatial Signals , 2007 .

[16]  Nava Rubin,et al.  Cluster-based analysis of FMRI data , 2006, NeuroImage.

[17]  I. Verdinelli,et al.  False Discovery Control for Random Fields , 2004 .

[18]  Jianqing Fan Test of Significance Based on Wavelet Thresholding and Neyman's Truncation , 1996 .

[19]  Xiaotong Shen,et al.  Nonparametric Hypothesis Testing for a Spatial Signal , 2002 .

[20]  Weidong Liu Incorporation of Sparsity Information in Large-scale Multiple Two-sample $t$ Tests , 2014, 1410.4282.

[21]  Peter Szor,et al.  The Art of Computer Virus Research and Defense , 2005 .

[22]  Q. Shao,et al.  Phase Transition and Regularized Bootstrap in Large Scale $t$-tests with False Discovery Rate Control , 2013, 1310.4371.

[23]  Harrison H. Zhou,et al.  False Discovery Rate Control With Groups , 2010, Journal of the American Statistical Association.

[24]  Ang Li,et al.  Accumulation Tests for FDR Control in Ordered Hypothesis Testing , 2015, 1505.07352.

[25]  Christian A. Rees,et al.  Microarray analysis reveals a major direct role of DNA copy number alteration in the transcriptional program of human breast tumors , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[26]  Ang Li,et al.  Multiple testing with the structure‐adaptive Benjamini–Hochberg algorithm , 2016, Journal of the Royal Statistical Society: Series B (Statistical Methodology).

[27]  J. Pritchard Are rare variants responsible for susceptibility to complex diseases? , 2001, American journal of human genetics.

[28]  Ganapati P. Patil,et al.  Geographic and Network Surveillance via Scan Statistics for Critical Area Detection , 2003 .

[29]  H. Chan,et al.  Detection with the scan and the average likelihood ratio , 2011, 1107.4344.

[30]  P. Hall,et al.  Asymptotically optimal difference-based estimation of variance in nonparametric regression , 1990 .

[31]  Oluwasanmi Koyejo,et al.  False Discovery Rate Smoothing , 2014, Journal of the American Statistical Association.

[32]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[33]  Wenguang Sun,et al.  False discovery control in large‐scale spatial multiple testing , 2015, Journal of the Royal Statistical Society. Series B, Statistical methodology.

[34]  Qiwei Yao,et al.  Tests for change-points with epidemic alternatives , 1993 .

[35]  D. Siegmund,et al.  False discovery rate for scanning statistics , 2011 .

[36]  Alexandra Chouldechova,et al.  FALSE DISCOVERY RATE CONTROL FOR SPATIAL DATA A DISSERTATION SUBMITTED TO THE DEPARTMENT OF STATISTICS AND THE COMMITTEE ON GRADUATE STUDIES OF STANFORD UNIVERSITY IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY , 2014 .

[37]  Ash A. Alizadeh,et al.  Genome-wide analysis of DNA copy-number changes using cDNA microarrays , 1999, Nature Genetics.

[38]  R. Tibshirani,et al.  Spatial smoothing and hot spot detection for CGH data using the fused lasso. , 2008, Biostatistics.