Multiple Change-point Detection: a Selective Overview

Very long and noisy sequence data arise from biological sciences to social science including high throughput data in genomics and stock prices in econometrics. Often such data are collected in order to identify and understand shifts in trend, e.g., from a bull market to a bear market in finance or from a normal number of chromosome copies to an excessive number of chromosome copies in genetics. Thus, identifying multiple change points in a long, possibly very long, sequence is an important problem. In this article, we review both classical and new multiple change-point detection strategies. Considering the long history and the extensive literature on the change-point detection, we provide an in-depth discussion on a normal mean change-point model from aspects of regression analysis, hypothesis testing, consistency and inference. In particular, we present a strategy to gather and aggregate local information for change-point detection that has become the cornerstone of several emerging methods because of its attractiveness in both computational and theoretical properties.

[1]  E. S. Page CONTINUOUS INSPECTION SCHEMES , 1954 .

[2]  Jeffrey D. Scargle,et al.  An algorithm for optimal partitioning of data on an interval , 2003, IEEE Signal Processing Letters.

[3]  A. Scott,et al.  A Cluster Analysis Method for Grouping Means in the Analysis of Variance , 1974 .

[4]  J. Kline,et al.  The cusum test of homogeneity with an application in spontaneous abortion epidemiology. , 1985, Statistics in medicine.

[5]  Pál Révész Limit theorems in probability and statistics , 1984 .

[6]  David Siegmund,et al.  Confidence Sets in Change-point Problems , 1988 .

[7]  Yi-Ching Yao Estimating the number of change-points via Schwarz' criterion , 1988 .

[8]  E. S. Venkatraman,et al.  A faster circular binary segmentation algorithm for the analysis of array CGH data , 2007, Bioinform..

[9]  R. Tibshirani,et al.  PATHWISE COORDINATE OPTIMIZATION , 2007, 0708.1485.

[10]  Z. Harchaoui,et al.  Multiple Change-Point Estimation With a Total Variation Penalty , 2010 .

[11]  S. Panchapakesan,et al.  Inference about the Change-Point in a Sequence of Random Variables: A Selection Approach , 1988 .

[12]  P. Fearnhead,et al.  Optimal detection of changepoints with a linear computational cost , 2011, 1101.1438.

[13]  Axel Munk,et al.  Heterogeneous change point inference , 2015, 1505.04898.

[14]  G. Cobb The problem of the Nile: Conditional solution to a changepoint problem , 1978 .

[15]  Adrian F. M. Smith,et al.  Hierarchical Bayesian Analysis of Changepoint Problems , 1992 .

[16]  R. Tibshirani,et al.  Spatial smoothing and hot spot detection for CGH data using the fused lasso. , 2008, Biostatistics.

[17]  A. Munk,et al.  Multiscale change point inference , 2013, 1301.7212.

[18]  Yi-Ching Yao,et al.  LEAST-SQUARES ESTIMATION OF A STEP FUNCTION , 2016 .

[19]  R. Tibshirani,et al.  Sparsity and smoothness via the fused lasso , 2005 .

[20]  David O Siegmund,et al.  A Modified Bayes Information Criterion with Applications to the Analysis of Comparative Genomic Hybridization Data , 2007, Biometrics.

[21]  B. Brodsky,et al.  Nonparametric Methods in Change Point Problems , 1993 .

[22]  Y. Yin,et al.  Detection of the number, locations and magnitudes of jumps , 1988 .

[23]  E. S. Page A test for a change in a parameter occurring at an unknown point , 1955 .

[24]  E. S. Page On problems in which a change in a parameter occurs at an unknown point , 1957 .

[25]  Edward Carlstein,et al.  Change-point problems , 1994 .

[26]  K. Lange,et al.  RECONSTRUCTING DNA COPY NUMBER BY PENALIZED ESTIMATION AND IMPUTATION. , 2009, The annals of applied statistics.

[27]  D. Hawkins Testing a Sequence of Observations for a Shift in Location , 1977 .

[28]  B. E. Brodsky,et al.  Non-Parametric Statistical Diagnosis: Problems and Methods , 2000 .

[29]  M. Wigler,et al.  Circular binary segmentation for the analysis of array-based DNA copy number data. , 2004, Biostatistics.

[30]  Qiwei Yao,et al.  Tests for change-points with epidemic alternatives , 1993 .

[31]  K. Worsley Confidence regions and tests for a change-point in a sequence of exponential family random variables , 1986 .

[32]  Tze-san Lee Change-Point Problems: Bibliography and Review , 2010 .

[33]  R. Tibshirani,et al.  Least angle regression , 2004, math/0406456.

[34]  Heping Zhang,et al.  Multiple Change-Point Detection via a Screening and Ranking Algorithm. , 2013, Statistica Sinica.

[35]  Arjun K. Gupta,et al.  Parametric Statistical Change Point Analysis , 2000 .

[36]  Xiaoming Huo,et al.  Near-optimal detection of geometric objects by fast multiscale methods , 2005, IEEE Transactions on Information Theory.

[37]  Heping Zhang,et al.  THE SCREENING AND RANKING ALGORITHM TO DETECT DNA COPY NUMBER VARIATIONS. , 2012, The annals of applied statistics.

[38]  Tao Huang,et al.  Detection of DNA copy number alterations using penalized least squares regression , 2005, Bioinform..

[39]  M. Srivastava,et al.  On Tests for Detecting Change in Mean , 1975 .

[40]  Junyang Qian,et al.  On pattern recovery of the fused Lasso , 2012, 1211.5194.

[41]  H. Müller,et al.  Multiple changepoint fitting via quasilikelihood, with application to DNA sequence segmentation , 2000 .

[42]  Piotr Fryzlewicz,et al.  Wild binary segmentation for multiple change-point detection , 2014, 1411.0858.

[43]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[44]  David V. Hinkley,et al.  Inference about the change-point in a sequence of binomial variables , 1970 .

[45]  A. Rinaldo Properties and refinements of the fused lasso , 2008, 0805.0234.