On optimal multiple changepoint algorithms for large data

Many common approaches to detecting changepoints, for example based on statistical criteria such as penalised likelihood or minimum description length, can be formulated in terms of minimising a cost over segmentations. We focus on a class of dynamic programming algorithms that can solve the resulting minimisation problem exactly, and thus find the optimal segmentation under the given statistical criteria. The standard implementation of these dynamic programming methods have a computational cost that scales at least quadratically in the length of the time-series. Recently pruning ideas have been suggested that can speed up the dynamic programming algorithms, whilst still being guaranteed to be optimal, in that they find the true minimum of the cost function. Here we extend these pruning methods, and introduce two new algorithms for segmenting data: FPOP and SNIP. Empirical results show that FPOP is substantially faster than existing dynamic programming methods, and unlike the existing methods its computational efficiency is robust to the number of changepoints in the data. We evaluate the method for detecting copy number variations and observe that FPOP has a computational cost that is even competitive with that of binary segmentation, but can give much more accurate segmentations.

[1]  Yuhang Wang,et al.  A novel stationary wavelet denoising algorithm for array-based DNA Copy Number data , 2007, Int. J. Bioinform. Res. Appl..

[2]  I E Auger,et al.  Algorithms for the optimal identification of segment neighborhoods. , 1989, Bulletin of mathematical biology.

[3]  Guillem Rigaill,et al.  Pruned dynamic programming for optimal multiple change-point detection , 2010 .

[4]  Michel Koskas,et al.  A Generic Implementation of the Pruned Dynamic Programing Algorithm , 2012 .

[5]  Richard A. Davis,et al.  Structural Break Estimation for Nonstationary Time Series Models , 2006 .

[6]  Alessandro Casini,et al.  Structural Breaks in Time Series , 2018, Oxford Research Encyclopedia of Economics and Finance.

[7]  Philip Jonathan,et al.  Detection of changes in variance of oceanographic time-series using changepoint analysis , 2010 .

[8]  P. Fearnhead,et al.  Optimal detection of changepoints with a linear computational cost , 2011, 1101.1438.

[9]  A. Scott,et al.  A Cluster Analysis Method for Grouping Means in the Analysis of Variance , 1974 .

[10]  Nancy R. Zhang,et al.  Detecting simultaneous changepoints in multiple sequences. , 2010, Biometrika.

[11]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[12]  Yi-Ching Yao Estimating the number of change-points via Schwarz' criterion , 1988 .

[13]  Robert Lund,et al.  A Review and Comparison of Changepoint Detection Techniques for Climate Data , 2007 .

[14]  P. Fearnhead,et al.  Efficient penalty search for multiple changepoint problems , 2014, 1412.3617.

[15]  Chung-Bow Lee Estimating the number of change points in a sequence of independent normal random variables , 1995 .

[16]  H. Akaike A new look at the statistical model identification , 1974 .

[17]  Stéphane Robin,et al.  Joint segmentation, calling, and normalization of multiple CGH profiles. , 2011, Biostatistics.

[18]  Marc Lavielle,et al.  Using penalized contrasts for the change-point problem , 2005, Signal Process..

[19]  Francis R. Bach,et al.  Learning smoothing models of copy number profiles using breakpoint annotations , 2013, BMC Bioinformatics.

[20]  Jeffrey D. Scargle,et al.  An algorithm for optimal partitioning of data on an interval , 2003, IEEE Signal Processing Letters.

[21]  Francis R. Bach,et al.  SegAnnDB: interactive Web-based genomic segmentation , 2014, Bioinform..

[22]  M. Wigler,et al.  Circular binary segmentation for the analysis of array-based DNA copy number data. , 2004, Biostatistics.

[23]  H. Müller,et al.  Statistical methods for DNA sequence segmentation , 1998 .

[24]  H. Müller,et al.  Multiple changepoint fitting via quasilikelihood, with application to DNA sequence segmentation , 2000 .

[25]  Piotr Fryzlewicz,et al.  Wild binary segmentation for multiple change-point detection , 2014, 1411.0858.

[26]  David O Siegmund,et al.  A Modified Bayes Information Criterion with Applications to the Analysis of Comparative Genomic Hybridization Data , 2007, Biometrics.

[27]  Axel Munk,et al.  Multiscale DNA partitioning: statistical evidence for segments , 2014, Bioinform..

[28]  A. Munk,et al.  Multiscale change point inference , 2013, 1301.7212.

[29]  Yi-Ching Yao,et al.  LEAST-SQUARES ESTIMATION OF A STEP FUNCTION , 2016 .