论文信息 - Learning to Plan via a Multi-Step Policy Regression Method

Learning to Plan via a Multi-Step Policy Regression Method

We propose a new approach to increase inference performance in environments that require a specific sequence of actions in order to be solved. This is for example the case for maze environments where ideally an optimal path is determined. Instead of learning a policy for a single step, we want to learn a policy that can predict n actions in advance. Our proposed method called policy horizon regression (PHR) uses knowledge of the environment sampled by A2C to learn an n dimensional policy vector in a policy distillation setup which yields n sequential actions per observation. We test our method on the MiniGrid and Pong environments and show drastic speedup during inference time by successfully predicting sequences of actions on a single observation.

[1] Doina Precup,et al. Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[2] MahadevanSridhar,et al. Recent Advances in Hierarchical Reinforcement Learning , 2003 .

[3] Shie Mannor,et al. Beyond the One Step Greedy Approach in Reinforcement Learning , 2018, ICML.

[4] Razvan Pascanu,et al. Distilling Policy Distillation , 2019, AISTATS.

[5] Sridhar Mahadevan,et al. Recent Advances in Hierarchical Reinforcement Learning , 2003, Discret. Event Dyn. Syst..

[6] Xipeng Shen,et al. Deep reuse: streamline CNN inference on the fly via coarse-grained computation reuse , 2019, ICS.

[7] Tom Eccles,et al. An investigation of model-free planning , 2019, ICML.

[8] Aleksandr I. Panov,et al. Grid Path Planning with Deep Reinforcement Learning: Preliminary Results , 2017, BICA.

[9] Sridhar Mahadevan,et al. Recent Advances in Hierarchical Reinforcement Learning , 2003, Discret. Event Dyn. Syst..

[10] Kee-Eung Kim,et al. Reinforcement Learning for Control with Multiple Frequencies , 2020, NeurIPS.

[11] Nils J. Nilsson,et al. A Formal Basis for the Heuristic Determination of Minimum Cost Paths , 1968, IEEE Trans. Syst. Sci. Cybern..

[12] Balaraman Ravindran,et al. Dynamic Action Repetition for Deep Reinforcement Learning , 2017, AAAI.

[13] Alex Graves,et al. Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[14] Marc G. Bellemare,et al. The Arcade Learning Environment: An Evaluation Platform for General Agents , 2012, J. Artif. Intell. Res..

[15] Richard S. Sutton,et al. Multi-step Reinforcement Learning: A Unifying Algorithm , 2017, AAAI.

[16] Razvan Pascanu,et al. Policy Distillation , 2015, ICLR.

[17] Doina Precup,et al. Intra-Option Learning about Temporally Abstract Actions , 1998, ICML.

[18] Alex Graves,et al. Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.