Sample Efficient Feature Selection for Factored MDPs

In reinforcement learning, the state of the real world is often represented by feature vectors. However, not all of the features may be pertinent for solving the current task. We propose Feature Selection Explore and Exploit (FS-EE), an algorithm that automatically selects the necessary features while learning a Factored Markov Decision Process, and prove that under mild assumptions, its sample complexity scales with the in-degree of the dynamics of just the necessary features, rather than the in-degree of all features. This can result in a much better sample complexity when the in-degree of the necessary features is smaller than the in-degree of all features.

[1]  Csaba Szepesvári,et al.  Online-to-Confidence-Set Conversions and Application to Sparse Stochastic Bandits , 2012, AISTATS.

[2]  Michael Kearns,et al.  Near-Optimal Reinforcement Learning in Polynomial Time , 1998, Machine Learning.

[3]  Shimon Whiteson,et al.  Automatic Feature Selection for Model-Based Reinforcement Learning in Factored MDPs , 2009, 2009 International Conference on Machine Learning and Applications.

[4]  Lihong Li,et al.  The adaptive k-meteorologists problem and its application to structure learning and feature selection in reinforcement learning , 2009, ICML '09.

[5]  Emma Brunskill,et al.  Concurrent PAC RL , 2015, AAAI.

[6]  Michael Kearns,et al.  Efficient Reinforcement Learning in Factored MDPs , 1999, IJCAI.

[7]  Peter Stone,et al.  Structure Learning in Ergodic Factored MDPs without Knowledge of the Transition Function's In-Degree , 2011, ICML.

[8]  Andrew McCallum,et al.  Reinforcement learning with selective perception and hidden state , 1996 .

[9]  Lihong Li,et al.  Reinforcement Learning in Finite MDPs: PAC Analysis , 2009, J. Mach. Learn. Res..

[10]  Ronald Ortner,et al.  Selecting Near-Optimal Approximate State Representations in Reinforcement Learning , 2014, ALT.

[11]  Ronald Parr,et al.  Greedy Algorithms for Sparse Reinforcement Learning , 2012, ICML.

[12]  Alex Graves,et al.  Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[13]  Alborz Geramifard,et al.  Online Discovery of Feature Dependencies , 2011, ICML.

[14]  Tze-Yun Leong,et al.  Online Feature Selection for Model-based Reinforcement Learning , 2013, ICML.

[15]  Shie Mannor,et al.  Off-policy Model-based Learning under Unknown Factored Dynamics , 2015, ICML.

[16]  Peter Auer,et al.  Logarithmic Online Regret Bounds for Undiscounted Reinforcement Learning , 2006, NIPS.