Model-based clustering and segmentation of time series with changes in regime

Mixture model-based clustering, usually applied to multidimensional data, has become a popular approach in many data analysis problems, both for its good statistical properties and for the simplicity of implementation of the Expectation–Maximization (EM) algorithm. Within the context of a railway application, this paper introduces a novel mixture model for dealing with time series that are subject to changes in regime. The proposed approach, called ClustSeg, consists in modeling each cluster by a regression model in which the polynomial coefficients vary according to a discrete hidden process. In particular, this approach makes use of logistic functions to model the (smooth or abrupt) transitions between regimes. The model parameters are estimated by the maximum likelihood method solved by an EM algorithm. This approach can also be regarded as a clustering approach which operates by finding groups of time series having common changes in regime. In addition to providing a time series partition, it therefore provides a time series segmentation. The problem of selecting the optimal numbers of clusters and segments is solved by means of the Bayesian Information Criterion. The ClustSeg approach is shown to be efficient using a variety of simulated time series and real-world time series of electrical power consumption from rail switching operations.

[1]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[2]  Catherine A. Sugar,et al.  Clustering for Sparsely Sampled Functional Data , 2003 .

[3]  G. McLachlan,et al.  The EM algorithm and extensions , 1996 .

[4]  Allou Samé,et al.  A hidden process regression model for functional data description. Application to curve discrimination , 2010, Neurocomputing.

[5]  B. Silverman,et al.  Functional Data Analysis , 1997 .

[6]  D. Madigan,et al.  Proceedings : KDD-99 : the Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, August 15-18, 1999, San Diego, California, USA , 1999 .

[7]  Jeng-Min Chiou,et al.  Functional clustering and identifying substructures of longitudinal data , 2007 .

[8]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[9]  G. Celeux,et al.  Assessing a Mixture Model for Clustering with the Integrated Classification Likelihood , 1998 .

[10]  Xueli Liu,et al.  Simultaneous curve registration and clustering for functional data , 2009, Comput. Stat. Data Anal..

[11]  Kui Wang,et al.  A Mixture model with random-effects components for clustering correlated gene-expression profiles , 2006, Bioinform..

[12]  G. McLachlan,et al.  The EM Algorithm and Extensions: Second Edition , 2008 .

[13]  A. Raftery,et al.  Model-based Gaussian and non-Gaussian clustering , 1993 .

[14]  Dit-Yan Yeung,et al.  Time series clustering with ARMA mixtures , 2004, Pattern Recognit..

[15]  W. Li,et al.  On a mixture autoregressive model , 2000 .

[16]  Geoffrey J. McLachlan,et al.  Finite Mixture Models , 2019, Annual Review of Statistics and Its Application.

[17]  B. Wang,et al.  Curve prediction and clustering with mixtures of Gaussian process functional regression models , 2008, Stat. Comput..

[18]  G. Coke,et al.  Random effects mixture models for clustering electrical load series , 2010 .

[19]  Padhraic Smyth,et al.  Curve Clustering with Random Effects Regression Mixtures , 2003, AISTATS.

[20]  Padhraic Smyth,et al.  Trajectory clustering with mixtures of regression models , 1999, KDD '99.

[21]  Yves Lechevallier,et al.  Exploratory analysis of functional data via clustering and optimal segmentation , 2010, Neurocomputing.

[22]  P. Green Iteratively reweighted least squares for maximum likelihood estimation , 1984 .

[23]  Gérard Govaert,et al.  Assessing a Mixture Model for Clustering with the Integrated Completed Likelihood , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[24]  Gérard Govaert,et al.  Gaussian parsimonious clustering models , 1995, Pattern Recognit..

[25]  Peter Schlattmann,et al.  Estimating the number of components in a finite mixture model: the special case of homogeneity , 2003, Comput. Stat. Data Anal..