Multiscale Quantile Segmentation

We introduce a new methodology for analyzing serial data by quantile regression assuming that the underlying quantile function consists of constant segments. The procedure does not rely on any distributional assumption besides serial independence. It is based on a multiscale statistic, which allows to control the (finite sample) probability for selecting the correct number of segments S at a given error level, which serves as a tuning parameter. For a proper choice of this parameter, this tends exponentially fast to the true S, as sample size increases. We further show that the location and size of segments are estimated at minimax optimal rate (compared to a Gaussian setting) up to a log-factor. Thereby, our approach leads to (asymptotically) uniform confidence bands for the entire quantile regression function in a fully nonparametric setup. The procedure is efficiently implemented using dynamic programming techniques with double heap structures, and software is provided. Simulations and data examples from genetic sequencing and ion channel recordings confirm the robustness of the proposed procedure, which at the same hand reliably detects changes in quantiles from arbitrary distributions with precise statistical guarantees.

[1]  B. Sakmann,et al.  Single-Channel Recording , 1995, Springer US.

[2]  David Siegmund,et al.  Change-Points: From Sequential Detection to Biology and Back , 2013 .

[3]  M. Virji Pathogenic neisseriae: surface modulation, pathogenesis and infection control , 2009, Nature Reviews Microbiology.

[4]  D. Paindaveine,et al.  Multivariate quantiles and multiple-output regression quantiles: from L1 optimization to halfspace depth , 2010, 1002.4486.

[5]  Hongzhe Li,et al.  Optimal Sparse Segment Identification With Application in Copy Number Variation Analysis , 2010, Journal of the American Statistical Association.

[6]  P. Massart The Tight Constant in the Dvoretzky-Kiefer-Wolfowitz Inequality , 1990 .

[7]  C. Holmes,et al.  Multiscale Blind Source Separation , 2016, 1608.07173.

[8]  A. Munk,et al.  Multiscale change point inference , 2013, 1301.7212.

[9]  Yuan Liao,et al.  Oracle Estimation of a Change Point in High-Dimensional Quantile Regression , 2016, Journal of the American Statistical Association.

[10]  J. S. Silva,et al.  Quantiles for Counts , 2002 .

[11]  Song Liu,et al.  Computational methods for detecting copy number variations in cancer genome using next generation sequencing: principles and challenges , 2013, Oncotarget.

[12]  Arne Kovac,et al.  Extensions of Smoothing via Taut Strings , 2008, 0803.2931.

[13]  Heping Zhang,et al.  THE SCREENING AND RANKING ALGORITHM TO DETECT DNA COPY NUMBER VARIATIONS. , 2012, The annals of applied statistics.

[14]  Radhakrishnan Gnanasambandam,et al.  Unsupervised Idealization of Ion Channel Recordings by Minimum Description Length: Application to Human PIEZO1-Channels , 2017, Front. Neuroinform..

[15]  J. Tukey Curves As Parameters, and Touch Estimation , 1961 .

[16]  Axel Munk,et al.  Autocovariance Estimation in Regression with a Discontinuous Signal and m‐Dependent Errors: A Difference‐Based Approach , 2015, 1507.02485.

[17]  Alexander Aue,et al.  Segmented Model Selection in Quantile Regression Using the Minimum Description Length Principle , 2014 .

[18]  Paul H. C. Eilers,et al.  Quantile smoothing of array CGH data , 2005, Bioinform..

[19]  R. Tibshirani,et al.  Spatial smoothing and hot spot detection for CGH data using the fused lasso. , 2008, Biostatistics.

[20]  Zongwu Cai,et al.  Partially varying coefficient instrumental variables models , 2012 .

[21]  Alexander Aue,et al.  Piecewise quantile autoregressive modeling for nonstationary time series , 2016, 1609.08882.

[22]  P. Fearnhead,et al.  Optimal detection of changepoints with a linear computational cost , 2011, 1101.1438.

[23]  Walter Krämer,et al.  Recursive computation of piecewise constant volatilities , 2012, Comput. Stat. Data Anal..

[24]  Julia Hirschberg,et al.  V-Measure: A Conditional Entropy-Based External Cluster Evaluation Measure , 2007, EMNLP.

[25]  Alexandre B. Tsybakov,et al.  Introduction to Nonparametric Estimation , 2008, Springer series in statistics.

[26]  Shin-Ho Chung,et al.  Biological membrane ion channels : dynamics, structure, and applications , 2006 .

[27]  Z. Harchaoui,et al.  Multiple Change-Point Estimation With a Total Variation Penalty , 2010 .

[28]  Axel Munk,et al.  Heterogeneous change point inference , 2015, 1505.04898.

[29]  V. Spokoiny,et al.  Multiscale testing of qualitative hypotheses , 2001 .

[30]  L. Duembgen,et al.  Multiscale inference about a density , 2007, 0706.3968.

[31]  Xuming He Quantile Curves without Crossing , 1997 .

[32]  Hao Chen,et al.  Asymptotic distribution-free change-point detection for multivariate and non-Euclidean data , 2017, The Annals of Statistics.

[33]  S. Shapiro,et al.  An Analysis of Variance Test for Normality (Complete Samples) , 1965 .

[34]  Merle Behr Finite Alphabet Blind Separation , 2018 .

[35]  David O Siegmund,et al.  A Modified Bayes Information Criterion with Applications to the Analysis of Comparative Genomic Hybridization Data , 2007, Biometrics.

[36]  David S. Matteson,et al.  A Nonparametric Approach for Multiple Change Point Analysis of Multivariate Data , 2013, 1306.4933.

[37]  R. Koenker Quantile Regression: Name Index , 2005 .

[38]  Yi Yu,et al.  Estimating whole‐brain dynamics by using spectral clustering , 2015, 1509.03730.

[39]  Axel Munk,et al.  Multiscale DNA partitioning: statistical evidence for segments , 2014, Bioinform..

[40]  Ji Zhu,et al.  BIOINFORMATICS ORIGINAL PAPER doi:10.1093/bioinformatics/btm364 Data and text mining Analysis of array CGH data for cancer studies using , 2022 .

[41]  H. Dette,et al.  Detection of Multiple Structural Breaks in Multivariate Time Series , 2013, 1309.1309.

[42]  Paul Fearnhead,et al.  Changepoint Detection in the Presence of Outliers , 2016, Journal of the American Statistical Association.

[43]  A. Munk,et al.  FDR-Control in Multiscale Change-point Segmentation , 2014, 1412.5844.

[44]  R Bellman,et al.  On the Theory of Dynamic Programming. , 1952, Proceedings of the National Academy of Sciences of the United States of America.

[45]  P. Davies,et al.  Local Extremes, Runs, Strings and Multiresolution , 2001 .

[46]  Alessandro Rinaldo,et al.  Optimal nonparametric change point detection and localization. , 2019, 1905.10019.

[47]  A. Belloni,et al.  L1-Penalized Quantile Regression in High Dimensional Sparse Models , 2009, 0904.2931.

[48]  Paul Fearnhead,et al.  Exact and efficient Bayesian inference for multiple changepoint problems , 2006, Stat. Comput..

[49]  Changliang Zou,et al.  Nonparametric maximum likelihood approach to multiple change-point problems , 2014, 1405.7173.

[50]  V. Liebscher,et al.  Consistencies and rates of convergence of jump-penalized least squares estimators , 2009, 0902.4838.

[51]  Piotr Fryzlewicz,et al.  Wild binary segmentation for multiple change-point detection , 2014, 1411.0858.

[52]  A. Futschik,et al.  Estimating the Effective Population Size from Temporal Allele Frequency Changes in Experimental Evolution , 2016, Genetics.

[53]  P. Fryzlewicz Tail-greedy bottom-up data decompositions and fast multiple change-point detection , 2018, The Annals of Statistics.

[54]  S. Kou,et al.  Stepwise Signal Extraction via Marginal Likelihood , 2016, Journal of the American Statistical Association.

[55]  C. Small A Survey of Multidimensional Medians , 1990 .

[56]  V. Chernozhukov,et al.  QUANTILE AND PROBABILITY CURVES WITHOUT CROSSING , 2007, 0704.3649.

[57]  Cun-Hui Zhang,et al.  Minimax Risk Bounds for Piecewise Constant Models , 2017 .

[58]  M. Wigler,et al.  Circular binary segmentation for the analysis of array-based DNA copy number data. , 2004, Biostatistics.

[59]  R. Koenker Quantile Regression: Fundamentals of Quantile Regression , 2005 .

[60]  Guanghui Wang,et al.  Change-point detection in multinomial data with a large number of categories , 2018, The Annals of Statistics.

[61]  Huihui Shen The Detection and Empirical Study of Variance Change Points on Housing Prices —Taking Wuhan City Commodity Prices as an Example , 2016 .

[62]  Paul Fearnhead,et al.  A computationally efficient nonparametric approach for changepoint detection , 2016, Statistics and Computing.

[63]  P. Perron,et al.  Estimating and testing linear models with multiple structural changes , 1995 .

[64]  G. Winkler,et al.  Complexity Penalized M-Estimation , 2008 .

[65]  B. Russell,et al.  Breaks and the statistical process of inflation: the case of estimating the ‘modern’ long-run Phillips curve , 2019 .

[66]  Jaakko Astola,et al.  On computation of the running median , 1989, IEEE Trans. Acoust. Speech Signal Process..

[67]  Lutz Dümbgen,et al.  New goodness-of-fit tests and their application to nonparametric confidence sets , 1998 .

[68]  Alain Celisse,et al.  New efficient algorithms for multiple change-point detection with reproducing kernels , 2018, Comput. Stat. Data Anal..

[69]  V. Spokoiny Multiscale local change point detection with applications to value-at-risk , 2009, 0906.1698.

[70]  P. Fryzlewicz,et al.  Narrowest‐over‐threshold detection of multiple change points and change‐point‐like features , 2016, Journal of the Royal Statistical Society: Series B (Statistical Methodology).

[71]  J. Wolfowitz,et al.  On a Test Whether Two Samples are from the Same Population , 1940 .

[72]  Eric Ruggieri,et al.  A pruned recursive solution to the multiple change point problem , 2018, Comput. Stat..