论文信息 - Discovering temporally extended features for reinforcement learning in domains with delayed causalities

Discovering temporally extended features for reinforcement learning in domains with delayed causalities

Discovering temporally delayed causalities from data raises no- toriously hard problems in reinforcement learning. In this paper we dene a space of temporally extended features, designed to capture such causal structures, using a generating operation. Our discovery algorithm PULSE exploits the generating operation to eciently discover a sparse subset of features. We provide convergence guarantees and apply our method to train a model-based as well as a model-free agent in dierent domains. In terms of achieved rewards and the number of required features our methods can achieve much better results than other feature expansion methods.

Marc Toussaint | Robert Lieck | Marc Toussaint | Robert Lieck

[1] Eric A. Hansen,et al. An Improved Policy Iteration Algorithm for Partially Observable MDPs , 1997, NIPS.

[2] Richard S. Sutton,et al. Predictive Representations of State , 2001, NIPS.

[3] Jorge Nocedal,et al. Representations of quasi-Newton matrices and their use in limited memory methods , 1994, Math. Program..

[4] Alborz Geramifard,et al. Adaptive Planning for Markov Decision Processes with Uncertain Transition Models via Incremental Feature Dependency Discovery , 2012, ECML/PKDD.

[5] Marcus Hutter,et al. Context tree maximizing reinforcement learning , 2012, AAAI 2012.

[6] Alborz Geramifard,et al. Online Discovery of Feature Dependencies , 2011, ICML.

[7] Andrew McCallum,et al. Efficiently Inducing Features of Conditional Random Fields , 2002, UAI.

[8] John D. Lafferty,et al. Inducing Features of Random Fields , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[9] Andrew Y. Ng,et al. Regularization and feature selection in least-squares temporal difference learning , 2009, ICML '09.

[10] Andrew McCallum,et al. Reinforcement learning with selective perception and hidden state , 1996 .

[11] Marcus Hutter,et al. Q-learning for history-based reinforcement learning , 2013, ACML.

[12] Michail G. Lagoudakis,et al. Least-Squares Policy Iteration , 2003, J. Mach. Learn. Res..

[13] Joel Veness,et al. Reinforcement Learning via AIXI Approximation , 2010, AAAI.

[14] W. Lovejoy. A survey of algorithmic methods for partially observed Markov decision processes , 1991 .

[15] Andrew McCallum,et al. Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.