K-ARMA Models for Clustering Time Series Data

We present an approach to clustering time series data using a model-based generalization of the K-Means algorithm which we call K-Models. We prove the convergence of this general algorithm and relate it to the hard-EM algorithm for mixture modeling. We then apply our method first with an AR( p ) clustering example and show how the clustering algorithm can be made robust to outliers using a least-absolute deviations criteria. We then build our clustering algorithm up for ARMA( p , q ) models and extend this to ARIMA( p , d , q ) models. We develop a goodness of fit statistic for the models fitted to clusters based on the Ljung-Box statistic. We perform experiments with simulated data to show how the algorithm can be used for outlier detection, detecting distributional drift, and discuss the impact of initialization method on empty clusters. We also perform experiments on real data which show that our method is competitive with other existing methods for similar time series clustering tasks.

[1]  I. Barnett,et al.  Autoregressive mixture models for clustering time series , 2020, Journal of Time Series Analysis.

[2]  Martin T. Wells,et al.  An Empirical Bayes Approach to Estimating Dynamic Models of Co-regulated Gene Expression , 2021, bioRxiv.

[3]  Khalid Abd El Mageed Hag ElAmin,et al.  Clustering Input Signals Based Identification Algorithms for Two-Input Single-Output Models with Autoregressive Moving Average Noises , 2020, Complex..

[4]  T. Ehring,et al.  Clustering Individuals on Limited Features of a Vector Autoregressive Model , 2020, Multivariate behavioral research.

[5]  Guangquan Zhang,et al.  Learning under Concept Drift: A Review , 2019, IEEE Transactions on Knowledge and Data Engineering.

[6]  Manuele Bicego,et al.  K-Random Forests: a K-means style algorithm for Random Forest clustering , 2019, 2019 International Joint Conference on Neural Networks (IJCNN).

[7]  Feng Li,et al.  A Genetic XK-Means Algorithm with Empty Cluster Reassignment , 2019, Symmetry.

[8]  Elizabeth Ann Maharaj,et al.  Time Series Clustering and Classification , 2019 .

[9]  Eva Ceulemans,et al.  Clustering Vector Autoregressive Models: Capturing Qualitative Differences in Within-Person Dynamics , 2016, Front. Psychol..

[10]  C. Croux,et al.  Commodity Dynamics: A Sparse Multi-Class Approach , 2016, 1604.01224.

[11]  Ying Wah Teh,et al.  Time-series clustering - A decade review , 2015, Inf. Syst..

[12]  Inderjit S. Dhillon,et al.  Clustering to forecast sparse time-series data , 2015, 2015 IEEE 31st International Conference on Data Engineering.

[13]  A. Bifet,et al.  A survey on concept drift adaptation , 2014, ACM Comput. Surv..

[14]  Ganapati Panda,et al.  Forecasting of currency exchange rates using an adaptive ARMA model with differential evolution based training , 2014, J. King Saud Univ. Comput. Inf. Sci..

[15]  Geeta Sikka,et al.  Recent Techniques of Clustering of Time Series Data: A Survey , 2012 .

[16]  Daniel Gildea,et al.  Convergence of the EM Algorithm for Gaussian Mixtures with Unbalanced Mixing Coefficients , 2012, ICML.

[17]  G. Coke,et al.  Random effects mixture models for clustering electrical load series , 2010 .

[18]  Pasi Fränti,et al.  Time-series clustering by approximate prototypes , 2008, ICPR.

[19]  Hans-Hermann Bock,et al.  Origins and extensions of the -means algorithm in cluster analysis. , 2008 .

[20]  Ting Su,et al.  In search of deterministic methods for initializing K-means and Gaussian mixture clustering , 2007, Intell. Data Anal..

[21]  Francesco Camastra,et al.  A novel kernel method for clustering , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[22]  Helmuth Späth,et al.  Algorithm 39 Clusterwise linear regression , 1979, Computing.

[23]  Dit-Yan Yeung,et al.  Time series clustering with ARMA mixtures , 2004, Pattern Recognit..

[24]  Julio Rodríguez,et al.  A Powerful Portmanteau Test of Lack of Fit for Time Series , 2002 .

[25]  Konstantinos Kalpakis,et al.  Distance measures for effective clustering of ARIMA time-series , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[26]  Roberto Baragona,et al.  A simulation study on clustering time series with metaheuristic methods , 2001 .

[27]  Naonori Ueda,et al.  Deterministic annealing EM algorithm , 1998, Neural Networks.

[28]  Anna Clara Monti A proposal for a residual autocorrelation test in linear models , 1994 .

[29]  Yoshua Bengio,et al.  Convergence Properties of the K-Means Algorithms , 1994, NIPS.

[30]  Richard T. Baillie,et al.  Small sample bias in conditional sum-of-squares estimators of fractionally integrated ARMA models , 1993 .

[31]  A. I. McLeod,et al.  DIAGNOSTIC CHECKING ARMA TIME SERIES MODELS USING SQUARED‐RESIDUAL AUTOCORRELATIONS , 1983 .

[32]  G. Box,et al.  On a measure of lack of fit in time series models , 1978 .

[33]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[34]  P. Young,et al.  Time series analysis, forecasting and control , 1972, IEEE Transactions on Automatic Control.

[35]  G. Box,et al.  Distribution of Residual Autocorrelations in Autoregressive-Integrated Moving Average Time Series Models , 1970 .

[36]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[37]  T. W. Anderson,et al.  On the Asymptotic Distribution of the Autocorrelations of a Sample from a Linear Stochastic Process , 1964 .

[38]  Francesco E. Maranzana,et al.  On the Location of Supply Points to Minimize Transportation Costs , 1963, IBM Syst. J..

[39]  R. Anderson Distribution of the Serial Correlation Coefficient , 1942 .