Partition Tree Weighting

This paper introduces the Partition Tree Weighting technique, an efficient meta-algorithm for piecewise stationary sources. The technique works by performing Bayesian model averaging over a large class of possible partitions of the data into locally stationary segments. It uses a prior, closely related to the Context Tree Weighting technique of Willems, that is well suited to data compression applications. Our technique can be applied to any coding distribution at an additional time and space cost only logarithmic in the sequence length. We provide a competitive analysis of the redundancy of our method, and explore its application in a variety of settings. The order of the redundancy and the complexity of our algorithm matches those of the best competitors available in the literature, and the new algorithm exhibits a superior complexity-performance trade-off in our experiments.

[1]  Raphail E. Krichevsky,et al.  The performance of universal encoding , 1981, IEEE Trans. Inf. Theory.

[2]  Fmj Frans Willems,et al.  Context maximizing : finding MDL decision trees , 1994 .

[3]  Frans M. J. Willems,et al.  The context-tree weighting method: basic properties , 1995, IEEE Trans. Inf. Theory.

[4]  Frans M. J. Willems,et al.  Coding for a binary independent piecewise-identically-distributed source , 1996, IEEE Trans. Inf. Theory.

[5]  S. Adak Time dependent spectral analysis of nonstationary time series , 1996 .

[6]  Vladimir Vovk,et al.  Derandomizing Stochastic Prediction Strategies , 1997, COLT '97.

[7]  F. Willems,et al.  Live-and-die coding for binary piecewise i.i.d. sources , 1997, Proceedings of IEEE International Symposium on Information Theory.

[8]  Timothy C. Bell,et al.  A corpus for the evaluation of lossless compression algorithms , 1997, Proceedings DCC '97. Data Compression Conference.

[9]  N. Merhav,et al.  Low complexity sequential lossless coding for piecewise stationary memoryless sources , 1998, Proceedings. 1998 IEEE International Symposium on Information Theory (Cat. No.98CH36252).

[10]  Neri Merhav,et al.  Low-complexity sequential lossless coding for piecewise-stationary memoryless sources , 1998, IEEE Trans. Inf. Theory.

[11]  Martin Zinkevich,et al.  Online Convex Programming and Generalized Infinitesimal Gradient Ascent , 2003, ICML.

[12]  Mark Herbster,et al.  Tracking the Best Expert , 1995, Machine Learning.

[13]  Paul Fearnhead,et al.  Exact and efficient Bayesian inference for multiple changepoint problems , 2006, Stat. Comput..

[14]  Gábor Lugosi,et al.  Prediction, learning, and games , 2006 .

[15]  Richard A. Davis,et al.  Structural Break Estimation for Nonstationary Time Series Models , 2006 .

[16]  Ryan P. Adams,et al.  Bayesian Online Changepoint Detection , 2007, 0710.3742.

[17]  P. Grünwald The Minimum Description Length Principle (Adaptive Computation and Machine Learning) , 2007 .

[18]  Wouter M. Koolen,et al.  Combining Expert Advice Efficiently , 2008, COLT.

[19]  Seshadhri Comandur,et al.  Efficient learning algorithms for changing environments , 2009, ICML '09.

[20]  Steven de Rooij,et al.  Learning the Switching Rate by Discretising Bernoulli Sources Online , 2009, AISTATS.

[21]  Georgios B. Giannakis,et al.  Sparse graphical modeling of piecewise-stationary time series , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[22]  Tamás Linder,et al.  Efficient Tracking of Large Classes of Experts , 2011, IEEE Transactions on Information Theory.

[23]  Joel Veness,et al.  Context Tree Switching , 2011, 2012 Data Compression Conference.

[24]  Marcus Hutter,et al.  Adaptive Context Tree Weighting , 2012, 2012 Data Compression Conference.