Discovering temporally extended features for reinforcement learning in domains with delayed causalities

Discovering temporally delayed causalities from data raises no- toriously hard problems in reinforcement learning. In this paper we dene a space of temporally extended features, designed to capture such causal structures, using a generating operation. Our discovery algorithm PULSE exploits the generating operation to eciently discover a sparse subset of features. We provide convergence guarantees and apply our method to train a model-based as well as a model-free agent in dierent domains. In terms of achieved rewards and the number of required features our methods can achieve much better results than other feature expansion methods.

[1]  Eric A. Hansen,et al.  An Improved Policy Iteration Algorithm for Partially Observable MDPs , 1997, NIPS.

[2]  Richard S. Sutton,et al.  Predictive Representations of State , 2001, NIPS.

[3]  Jorge Nocedal,et al.  Representations of quasi-Newton matrices and their use in limited memory methods , 1994, Math. Program..

[4]  Alborz Geramifard,et al.  Adaptive Planning for Markov Decision Processes with Uncertain Transition Models via Incremental Feature Dependency Discovery , 2012, ECML/PKDD.

[5]  Marcus Hutter,et al.  Context tree maximizing reinforcement learning , 2012, AAAI 2012.

[6]  Alborz Geramifard,et al.  Online Discovery of Feature Dependencies , 2011, ICML.

[7]  Andrew McCallum,et al.  Efficiently Inducing Features of Conditional Random Fields , 2002, UAI.

[8]  John D. Lafferty,et al.  Inducing Features of Random Fields , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[9]  Andrew Y. Ng,et al.  Regularization and feature selection in least-squares temporal difference learning , 2009, ICML '09.

[10]  Andrew McCallum,et al.  Reinforcement learning with selective perception and hidden state , 1996 .

[11]  Marcus Hutter,et al.  Q-learning for history-based reinforcement learning , 2013, ACML.

[12]  Michail G. Lagoudakis,et al.  Least-Squares Policy Iteration , 2003, J. Mach. Learn. Res..

[13]  Joel Veness,et al.  Reinforcement Learning via AIXI Approximation , 2010, AAAI.

[14]  W. Lovejoy A survey of algorithmic methods for partially observed Markov decision processes , 1991 .

[15]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.