A factor model approach for the joint segmentation with between‐series correlation

We consider the segmentation of set of correlated time-series, the correlation being allowed to take an arbitrary form but being the same at each time-position. We show that encoding the dependency in a factor model enables us to use the dynamic programming algorithm for the inference of the breakpoints, which remains one the most efficient algorithm. We propose a model selection procedure to determine both the number of breakpoints and the number of factors. This proposed method is implemented in the FASeg R package, which is available on the CRAN. We demonstrate the performances of our procedure through simulation experiments and an application to geodesic data is presented.

[1]  Florent Chatelain,et al.  Bayesian Model for Multiple Change-Points Detection in Multivariate Time Series , 2014, IEEE Transactions on Signal Processing.

[2]  Peter J. Park,et al.  Comparative analysis of algorithms for identifying amplifications and deletions in array CGH data , 2005, Bioinform..

[3]  J. Bai,et al.  Determining the Number of Factors in Approximate Factor Models , 2000 .

[4]  Franck Picard,et al.  Author's Personal Copy Computational Statistics and Data Analysis Joint Segmentation of Multivariate Gaussian Processes Using Mixed Linear Models , 2022 .

[5]  Andrey Anikin,et al.  Nonlinguistic vocalizations from online amateur videos for emotion research: A validated corpus , 2016, Behavior Research Methods.

[6]  Tengyao Wang,et al.  High dimensional change point estimation via sparse projection , 2016, 1606.06246.

[7]  D. V. Dyk,et al.  Fitting Mixed-Effects Models Using Efficient EM-Type Algorithms , 2000 .

[8]  David O Siegmund,et al.  A Modified Bayes Information Criterion with Applications to the Analysis of Comparative Genomic Hybridization Data , 2007, Biometrics.

[9]  David S. Matteson,et al.  A Nonparametric Approach for Multiple Change Point Analysis of Multivariate Data , 2013, 1306.4933.

[10]  Emilie Lebarbier,et al.  Detecting multiple change-points in the mean of Gaussian process by model selection , 2005, Signal Process..

[11]  Peter Domonkos,et al.  HOMER : a homogenization software - methods and applications , 2013 .

[12]  A. Amiri-Simkooei,et al.  Noise in multivariate GPS position time-series , 2009 .

[13]  A. Aue,et al.  Break detection in the covariance structure of multivariate time series models , 2009, 0911.3796.

[14]  M. Lavielle,et al.  Detection of multiple change-points in multivariate time series , 2006 .

[15]  Paul Fearnhead,et al.  On optimal multiple changepoint algorithms for large data , 2014, Statistics and Computing.

[16]  Yehuda Bock,et al.  Spatiotemporal filtering using principal component analysis and Karhunen-Loeve expansion approaches for regional GPS network analysis , 2006 .

[17]  S. Robin,et al.  A robust approach for estimating change-points in the mean of an AR(p) process , 2014, 1403.1958.

[18]  Matt A. King,et al.  Detecting offsets in GPS time series: First results from the detection of offsets in GPS experiment , 2013 .

[19]  P. Fearnhead,et al.  Optimal detection of changepoints with a linear computational cost , 2011, 1101.1438.

[20]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[21]  É. Moulines,et al.  Least‐squares Estimation of an Unknown Number of Shifts in a Time Series , 2000 .

[22]  P. Perron,et al.  Computation and Analysis of Multiple Structural-Change Models , 1998 .

[23]  Stéphane Robin,et al.  Joint segmentation, calling, and normalization of multiple CGH profiles. , 2011, Biostatistics.

[24]  Marc Lavielle,et al.  Using penalized contrasts for the change-point problem , 2005, Signal Process..

[25]  Henri Caussinus,et al.  Detection and correction of artificial shifts in climate series , 2004 .

[26]  Franck Picard,et al.  A statistical approach for array CGH data analysis , 2005, BMC Bioinformatics.

[27]  Nancy R. Zhang,et al.  Detecting simultaneous changepoints in multiple sequences. , 2010, Biometrika.

[28]  Piotr Fryzlewicz,et al.  Simultaneous multiple change-point and factor analysis for high-dimensional time series , 2016, Journal of Econometrics.

[29]  Simon D. P. Williams,et al.  Offsets in Global Positioning System time series , 2003 .

[30]  Piotr Fryzlewicz,et al.  Multiple‐change‐point detection for high dimensional time series via sparsified binary segmentation , 2015, 1611.08639.

[31]  Piotr Fryzlewicz,et al.  Multiscale and multilevel technique for consistent segmentation of nonstationary time series , 2016, 1611.09727.

[32]  Chloé Friguet,et al.  A Factor Model Approach to Multiple Testing Under Dependence , 2009 .

[33]  R. Tibshirani,et al.  A fused lasso latent feature model for analyzing multi-sample aCGH data. , 2011, Biostatistics.

[34]  Dorothy T. Thayer,et al.  EM algorithms for ML factor analysis , 1982 .