A heuristic, iterative algorithm for change-point detection in abrupt change models

Change-point detection in abrupt change models is a very challenging research topic in many fields of both methodological and applied Statistics. Due to strong irregularities, discontinuity and non-smootheness, likelihood based procedures are awkward; for instance, usual optimization methods do not work, and grid search algorithms represent the most used approach for estimation. In this paper a heuristic, iterative algorithm for approximate maximum likelihood estimation is introduced for change-point detection in piecewise constant regression models. The algorithm is based on iterative fitting of simple linear models, and appears to extend easily to more general frameworks, such as models including continuous covariates with possible ties, distinct change-points referring to different covariates, and further covariates without change-point. In these scenarios grid search algorithms do not straightforwardly apply. The proposed algorithm is validated through some simulation studies and applied to two real datasets.

[1]  M. Lavielle Detection of multiple changes in a sequence of dependent variables , 1999 .

[2]  Christopher H. Jackson,et al.  Models for longitudinal data with censored changepoints , 2004 .

[3]  Elena Marchiori,et al.  Chromosomal Breakpoint Detection in Human Cancer , 2003, EvoWorkshops.

[4]  A. Munk,et al.  Multiscale change point inference , 2013, 1301.7212.

[5]  Yi-Ching Yao,et al.  LEAST-SQUARES ESTIMATION OF A STEP FUNCTION , 2016 .

[6]  I. Johnstone,et al.  Adapting to Unknown Smoothness via Wavelet Shrinkage , 1995 .

[7]  Paul H. C. Eilers,et al.  Quantile smoothing of array CGH data , 2005, Bioinform..

[8]  P. Fearnhead,et al.  Optimal detection of changepoints with a linear computational cost , 2011, 1101.1438.

[9]  L. Horváth,et al.  The Maximum Likelihood Method for Testing Changes in the Parameters of Normal Observations , 1993 .

[10]  Vito M. R. Muggeo,et al.  Efficient change point detection for genomic sequences of continuous measurements , 2011, Bioinform..

[11]  G. Winkler,et al.  Complexity Penalized M-Estimation , 2008 .

[12]  L. Dümbgen The Asymptotic Behavior of Some Nonparametric Change-Point Estimators , 1991 .

[13]  Paul Fearnhead,et al.  On optimal multiple changepoint algorithms for large data , 2014, Statistics and Computing.

[14]  A. Tishler,et al.  A New Maximum Likelihood Algorithm for Piecewise Regression , 1981 .

[15]  Ajay N. Jain,et al.  Hidden Markov models approach to the analysis of array CGH data , 2004 .

[16]  R. Tibshirani,et al.  Spatial smoothing and hot spot detection for CGH data using the fused lasso. , 2008, Biostatistics.

[17]  Åsa Hedman,et al.  SW-ARRAY: a dynamic programming solution for the identification of copy-number changes in genomic DNA using array comparative genome hybridization data , 2005, Nucleic acids research.

[18]  Klaus-Robert Müller,et al.  Feature Extraction for Change-Point Detection Using Stationary Subspace Analysis , 2011, IEEE Transactions on Neural Networks and Learning Systems.

[19]  David Siegmund,et al.  Change-Points: From Sequential Detection to Biology and Back , 2013 .

[20]  Kung-Yee Liang,et al.  On estimating the change point in generalized linear models , 2008, 0805.2485.

[21]  Douglas Grove,et al.  Denoising array-based comparative genomic hybridization data using wavelets. , 2005, Biostatistics.

[22]  R. Tibshirani,et al.  A method for calling gains and losses in array CGH data. , 2005, Biostatistics.

[23]  David C. Atkins,et al.  Segmented mixed models with random changepoints: a maximum likelihood approach with application to treatment for depression study , 2014 .

[24]  Nathan S. Balke Detecting Level Shifts in Time Series , 1993 .

[25]  V. Muggeo Estimating regression models with unknown break‐points , 2003, Statistics in medicine.

[26]  A. Banerjee,et al.  Modelling structural breaks, long memory and stock market volatility: an overview , 2005 .

[27]  Jie Chen,et al.  Change-point analysis as a tool to detect abrupt climate variations , 2012, Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences.

[28]  P. Perron,et al.  Computation and Analysis of Multiple Structural-Change Models , 1998 .

[29]  Yi Li,et al.  Bayesian Hidden Markov Modeling of Array CGH Data , 2008, Journal of the American Statistical Association.

[30]  A. Scott,et al.  A Cluster Analysis Method for Grouping Means in the Analysis of Variance , 1974 .

[31]  K. Hornik,et al.  Model-Based Recursive Partitioning , 2008 .

[32]  M. Wigler,et al.  Circular binary segmentation for the analysis of array-based DNA copy number data. , 2004, Biostatistics.

[33]  Piotr Fryzlewicz,et al.  Multiscale and multilevel technique for consistent segmentation of nonstationary time series , 2016, 1611.09727.

[34]  Paul H. C. Eilers,et al.  Visualization of Genomic Changes by Segmented Smoothing Using an L 0 Penalty , 2012, PloS one.

[35]  Paul Fearnhead,et al.  Exact and efficient Bayesian inference for multiple changepoint problems , 2006, Stat. Comput..

[36]  V. Liebscher,et al.  Consistencies and rates of convergence of jump-penalized least squares estimators , 2009, 0902.4838.

[37]  D. Hawkins Fitting multiple change-point models to data , 2001 .

[38]  H. Müller,et al.  Multiple changepoint fitting via quasilikelihood, with application to DNA sequence segmentation , 2000 .

[39]  G. Cobb The problem of the Nile: Conditional solution to a changepoint problem , 1978 .

[40]  Jeffrey D. Scargle,et al.  An algorithm for optimal partitioning of data on an interval , 2003, IEEE Signal Processing Letters.

[41]  C. Loader CHANGE POINT ESTIMATION USING NONPARAMETRIC REGRESSION , 1996 .

[42]  Idris A. Eckley,et al.  changepoint: An R Package for Changepoint Analysis , 2014 .

[43]  Stéphane Robin,et al.  Exact posterior distributions and model selection criteria for multiple change-point detection problems , 2012, Stat. Comput..

[44]  Roberto Pastor-Barriuso,et al.  Transition models for change‐point estimation in logistic regression , 2003, Statistics in medicine.

[45]  E. S. Venkatraman,et al.  A faster circular binary segmentation algorithm for the analysis of array CGH data , 2007, Bioinform..

[46]  Tao Huang,et al.  Detection of DNA copy number alterations using penalized least squares regression , 2005, Bioinform..

[47]  D. Pinkel,et al.  Array comparative genomic hybridization and its applications in cancer , 2005, Nature Genetics.