Skip Context Tree Switching

Context Tree Weighting is a powerful probabilistic sequence prediction technique that efficiently performs Bayesian model averaging over the class of all prediction suffix trees of bounded depth. In this paper we show how to generalize this technique to the class of K-skip prediction suffix trees. Contrary to regular prediction suffix trees, K-skip prediction suffix trees are permitted to ignore up to K contiguous portions of the context. This allows for significant improvements in predictive accuracy when irrelevant variables are present, a case which often occurs within record-aligned data and images. We provide a regret-based analysis of our approach, and empirically evaluate it on the Calgary corpus and a set of Atari 2600 screen prediction tasks.

[1]  Raphail E. Krichevsky,et al.  The performance of universal encoding , 1981, IEEE Trans. Inf. Theory.

[2]  Ian H. Witten,et al.  Arithmetic coding for data compression , 1987, CACM.

[3]  Ian H. Witten,et al.  Modeling for text compression , 1989, CSUR.

[4]  Mark Herbster,et al.  Tracking the Best Expert , 1995, Machine-mediated learning.

[5]  Frans M. J. Willems,et al.  Context weighting for general finite-context sources , 1996, IEEE Trans. Inf. Theory.

[6]  Andrew McCallum,et al.  Reinforcement learning with selective perception and hidden state , 1996 .

[7]  H. Chipman,et al.  Bayesian CART Model Search , 1998 .

[8]  Dana Ron,et al.  The power of amnesia: Learning probabilistic automata with variable memory length , 1996, Machine Learning.

[9]  Y. Shtarkov,et al.  The context-tree weighting method: basic properties , 1995, IEEE Trans. Inf. Theory.

[10]  Y. Singer,et al.  The Power of Amnesia: Learning Probabilistic Automata with Variable Memory Length , 2005, Machine-mediated learning.

[11]  Charles Lee Isbell,et al.  Looping suffix tree-based inference of partially observable hidden state , 2006, ICML.

[12]  Steven de Rooij,et al.  Catching Up Faster in Bayesian Model Selection and Model Averaging , 2007, NIPS.

[13]  Yee Whye Teh,et al.  Lossless Compression Based on the Sequence Memoizer , 2010, 2010 Data Compression Conference.

[14]  Joel Veness,et al.  A Monte-Carlo AIXI Approximation , 2009, J. Artif. Intell. Res..

[15]  Erik Talvitie,et al.  Learning Partially Observable Models Using Temporally Abstract Decision Trees , 2012, NIPS.

[16]  Joel Veness,et al.  Context Tree Switching , 2011, 2012 Data Compression Conference.

[17]  Wouter M. Koolen,et al.  Universal Codes From Switching Strategies , 2013, IEEE Transactions on Information Theory.

[18]  Marcus Hutter,et al.  Sparse Adaptive Dirichlet-Multinomial-like Processes , 2013, COLT.

[19]  Marc G. Bellemare,et al.  Bayesian Learning of Recursively Factored Environments , 2013, ICML.

[20]  Yee Whye Teh,et al.  Top-down particle filtering for Bayesian decision trees , 2013, ICML.

[21]  Marc G. Bellemare,et al.  The Arcade Learning Environment: An Evaluation Platform for General Agents (Extended Abstract) , 2012, IJCAI.