Online Clustering of Processes

The problem of online clustering is considered in the case where each data point is a sequence generated by a stationary ergodic process. Data arrive in an online fashion so that the sample received at every timestep is either a continuation of some previously received sequence or a new sequence. The dependence between the sequences can be arbitrary. No parametric or independence assumptions are made; the only assumption is that the marginal distribution of each sequence is stationary and ergodic. A novel, computationally ecient algorithm is proposed and is shown to be asymptotically consistent (under a natural notion of consistency). The performance of the proposed algorithm is evaluated on simulated data, as well as on real datasets (motion classification).

[1]  Boris Ryabko,et al.  Nonparametric Statistical Inference for Ergodic Processes , 2010, IEEE Transactions on Information Theory.

[2]  Benjamin Weiss,et al.  How Sampling Reveals a Process , 1990 .

[3]  D. Ryabko Testing composite hypotheses about discrete ergodic processes , 2012 .

[4]  Reza Bosagh Zadeh,et al.  A Uniqueness Theorem for Clustering , 2009, UAI.

[5]  P. Shields The Ergodic Theory of Discrete Sample Paths , 1996 .

[6]  Joydeep Ghosh,et al.  A Unified Framework for Model-based Clustering , 2003, J. Mach. Learn. Res..

[7]  Daniil Ryabko Clustering processes , 2010, ICML.

[8]  BiernackiChristophe,et al.  Assessing a Mixture Model for Clustering with the Integrated Completed Likelihood , 2000 .

[9]  Gábor Lugosi,et al.  Prediction, learning, and games , 2006 .

[10]  Lei Li,et al.  Time Series Clustering: Complex is Simpler! , 2011, ICML.

[11]  Jon M. Kleinberg,et al.  An Impossibility Theorem for Clustering , 2002, NIPS.

[12]  Daniil Ryabko,et al.  Discrimination Between B-Processes is Impossible , 2010 .

[13]  Mahesh Kumar,et al.  Clustering seasonality patterns in the presence of errors , 2002, KDD.

[14]  Padhraic Smyth,et al.  Clustering Sequences with Hidden Markov Models , 1996, NIPS.

[15]  Tony Jebara,et al.  Spectral Clustering and Embedding with Hidden Markov Models , 2007, ECML.

[16]  Gérard Govaert,et al.  Assessing a Mixture Model for Clustering with the Integrated Completed Likelihood , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[17]  Robert M. Gray,et al.  Probability, Random Processes, And Ergodic Properties , 1987 .

[18]  Michael I. Jordan,et al.  Learning graphical models for stationary time series , 2004, IEEE Transactions on Signal Processing.