High dimensional change point estimation via sparse projection

Change points are a very common feature of ‘big data’ that arrive in the form of a data stream. We study high dimensional time series in which, at certain time points, the mean structure changes in a sparse subset of the co‐ordinates. The challenge is to borrow strength across the co‐ordinates to detect smaller changes than could be observed in any individual component series. We propose a two‐stage procedure called inspect for estimation of the change points: first, we argue that a good projection direction can be obtained as the leading left singular vector of the matrix that solves a convex optimization problem derived from the cumulative sum transformation of the time series. We then apply an existing univariate change point estimation algorithm to the projected series. Our theory provides strong guarantees on both the number of estimated change points and the rates of convergence of their locations, and our numerical studies validate its highly competitive empirical performance for a wide range of data‐generating mechanisms. Software implementing the methodology is available in the R package InspectChangepoint.

[1]  Ivor Cribben,et al.  Estimating whole‐brain dynamics by using spectral clustering , 2017 .

[2]  Piotr Fryzlewicz,et al.  Multiscale and multilevel technique for consistent segmentation of nonstationary time series , 2016, 1611.09727.

[3]  Haeran Cho,et al.  Change-point detection in panel data via double CUSUM statistic , 2016, 1611.08631.

[4]  M. Jirak Uniform change point tests in high dimension , 2015, 1511.05333.

[5]  Ivor Cribben,et al.  Estimating whole brain dynamics using spectral clustering , 2015, 1509.03730.

[6]  H. Ombao,et al.  Detection of Changes in Multivariate Time Series With Application to EEG Data , 2015 .

[7]  M. Cugmas,et al.  On comparing partitions , 2015 .

[8]  Piotr Fryzlewicz,et al.  Multiple‐change‐point detection for high dimensional time series via sparsified binary segmentation , 2015, 1611.08639.

[9]  Venkat Chandrasekaran,et al.  High-dimensional change-point estimation: Combining filtering with convex optimization , 2014, 2015 IEEE International Symposium on Information Theory (ISIT).

[10]  Piotr Fryzlewicz,et al.  Wild binary segmentation for multiple change-point detection , 2014, 1411.0858.

[11]  J. Aston,et al.  Change Points in High Dimensional Settings , 2014, 1409.1771.

[12]  Igor Nikiforov,et al.  Sequential Analysis: Hypothesis Testing and Changepoint Detection , 2014 .

[13]  Quentin Berthet,et al.  Statistical and computational trade-offs in estimation of sparse principal components , 2014, 1408.5369.

[14]  L. Horváth,et al.  Extensions of some classical methods in change point analysis , 2014 .

[15]  Tengyao Wang,et al.  A useful variant of the Davis--Kahan theorem for statisticians , 2014, 1405.0680.

[16]  Farida Enikeeva,et al.  High-dimensional change-point detection with sparse alternatives , 2013, 1312.1900.

[17]  Jing Lei,et al.  Fantope Projection and Selection: A near-optimal convex relaxation of sparse PCA , 2013, NIPS.

[18]  David S. Matteson,et al.  ecp: An R Package for Nonparametric Multiple Change Point Analysis of Multivariate Data , 2013, 1309.3295.

[19]  H. Dette,et al.  Detection of Multiple Structural Breaks in Multivariate Time Series , 2013, 1309.1309.

[20]  Douglas M. Hawkins,et al.  Detection of multiple change-points in multivariate data , 2013 .

[21]  A. Munk,et al.  Multiscale change point inference , 2013, 1301.7212.

[22]  J. Aston,et al.  Evaluating stationarity via change-point alternatives with applications to fMRI data , 2012, 1301.2894.

[23]  L. Horváth,et al.  Change‐point detection in panel data , 2012 .

[24]  Junchan Zhao,et al.  Early warning CUSUM plans for surveillance of infectious diseases in Wuhan, China , 2012 .

[25]  Johan Segers,et al.  Detecting changes in cross-sectional dependence in multivariate time series , 2012, J. Multivar. Anal..

[26]  Marc E. Pfetsch,et al.  The Computational Complexity of the Restricted Isometry Property, the Nullspace Property, and Related Concepts in Compressed Sensing , 2012, IEEE Transactions on Information Theory.

[27]  Jean-Philippe Vert,et al.  The group fused Lasso for multiple change-point detection , 2011, 1106.4199.

[28]  Neal Parikh,et al.  Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers , 2011, Found. Trends Mach. Learn..

[29]  Yunmei Chen,et al.  Projection Onto A Simplex , 2011, 1101.6081.

[30]  P. Fearnhead,et al.  Optimal detection of changepoints with a linear computational cost , 2011, 1101.1438.

[31]  Ross Sparks,et al.  Early warning CUSUM plans for surveillance of negative binomial daily disease counts , 2010 .

[32]  Nancy R. Zhang,et al.  Detecting simultaneous changepoints in multiple sequences. , 2010, Biometrika.

[33]  J. Bai,et al.  Common breaks in means and variances for panel data , 2010 .

[34]  Frank H. Wilhelm,et al.  Change point analysis for longitudinal physiological data: Detection of cardio-respiratory changes preceding panic attacks , 2010, Biological Psychology.

[35]  A. Aue,et al.  Break detection in the covariance structure of multivariate time series models , 2009, 0911.3796.

[36]  I. Johnstone,et al.  On Consistency and Sparsity for Principal Components Analysis in High Dimensions , 2009, Journal of the American Statistical Association.

[37]  M. Cule,et al.  Maximum likelihood estimation of a multi‐dimensional log‐concave density , 2008, 0804.3989.

[38]  P. Bickel,et al.  SIMULTANEOUS ANALYSIS OF LASSO AND DANTZIG SELECTOR , 2008, 0801.1095.

[39]  M. Lavielle,et al.  Detection of multiple change-points in multivariate time series , 2006 .

[40]  H. Ombao,et al.  SLEX Analysis of Multivariate Nonstationary Time Series , 2005 .

[41]  M. Wigler,et al.  Circular binary segmentation for the analysis of array-based DNA copy number data. , 2004, Biostatistics.

[42]  Michael I. Jordan,et al.  A Direct Formulation for Sparse Pca Using Semidefinite Programming , 2004, SIAM Rev..

[43]  R. Taylor A User's Guide to Measure-Theoretic Probability , 2003 .

[44]  L. Horváth,et al.  Limit Theorems in Change-Point Analysis , 1997 .

[45]  Arjun K. Gupta,et al.  Testing and Locating Variance Changepoints with Application to Stock Prices , 1997 .

[46]  F. Hampel The Influence Curve and Its Role in Robust Estimation , 1974 .

[47]  Camille Roth,et al.  Natural Scales in Geographical Patterns , 1971, Scientific Reports.

[48]  W. Kahan,et al.  The Rotation of Eigenvectors by a Perturbation. III , 1970 .

[49]  D. Slepian The one-sided barrier problem for Gaussian noise , 1962 .

[50]  D. Slepian First Passage Time for a Particular Gaussian Process , 1961 .

[51]  P. Erdös,et al.  A limit theorem for the maximum of normalized sums of independent random variables , 1956 .

[52]  E. S. Page A test for a change in a parameter occurring at an unknown point , 1955 .

[53]  L. Isserlis ON A FORMULA FOR THE PRODUCT-MOMENT COEFFICIENT OF ANY ORDER OF A NORMAL FREQUENCY DISTRIBUTION IN ANY NUMBER OF VARIABLES , 1918 .

[54]  Stark C. Draper,et al.  VIA THE ALTERNATING DIRECTION METHOD OF MULTIPLIERS , 2017 .

[55]  Silvio Simani,et al.  Fault Detection and Diagnosis for Aeronautic and Aerospace Missions , 2010 .

[56]  Wenfei Fan Consistency : Logic-Based Approaches Problem set # 2 , 2007 .

[57]  Wen-Chyuan Yueh EIGENVALUES OF SEVERAL TRIDIAGONAL MATRICES , 2005 .

[58]  K. Ramamohanarao,et al.  Title Suppressed Due to Excessive Length 3 2 Our Solution : Source IP , 2004 .

[59]  J. Steinebach,et al.  Testing for Changes in Multivariate Dependent Observations with an Application to Temperature Changes , 1999 .

[60]  C. Sims Multivariate Time Series Models , 1990 .

[61]  B. Mercier,et al.  A dual algorithm for the solution of nonlinear variational problems via finite element approximation , 1976 .

[62]  K Fan,et al.  Minimax Theorems. , 1953, Proceedings of the National Academy of Sciences of the United States of America.