Learning to Repeat: Fine Grained Action Repetition for Deep Reinforcement Learning

Reinforcement Learning algorithms can learn complex behavioral patterns for sequential decision making tasks wherein an agent interacts with an environment and acquires feedback in the form of rewards sampled from it. Traditionally, such algorithms make decisions, i.e., select actions to execute, at every single time step of the agent-environment interactions. In this paper, we propose a novel framework, Fine Grained Action Repetition (FiGAR), which enables the agent to decide the action as well as the time scale of repeating it. FiGAR can be used for improving any Deep Reinforcement Learning algorithm which maintains an explicit policy estimate by enabling temporal abstractions in the action space. We empirically demonstrate the efficacy of our framework by showing performance improvements on top of three policy search algorithms in different domains: Asynchronous Advantage Actor Critic in the Atari 2600 domain, Trust Region Policy Optimization in Mujoco domain and Deep Deterministic Policy Gradients in the TORCS car racing domain.

[1]  Peter D. Lawrence,et al.  Transition Point Dynamic Programming , 1993, NIPS.

[2]  Michael O. Duff,et al.  Reinforcement Learning Methods for Continuous-Time Markov Decision Problems , 1994, NIPS.

[3]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .

[4]  Thomas G. Dietterich Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition , 1999, J. Artif. Intell. Res..

[5]  Christos Dimitrakakis,et al.  TORCS, The Open Racing Car Simulator , 2005 .

[6]  Abhijit Gosavi,et al.  Self-Improving Factory Simulation using Continuous-time Average-Reward Reinforcement Learning , 2007 .

[7]  Yuval Tassa,et al.  MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[8]  Guy Lever,et al.  Deterministic Policy Gradient Algorithms , 2014, ICML.

[9]  Sergey Levine,et al.  Trust Region Policy Optimization , 2015, ICML.

[10]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[11]  Jian Sun,et al.  Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[12]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[13]  Marc G. Bellemare,et al.  The Arcade Learning Environment: An Evaluation Platform for General Agents , 2012, J. Artif. Intell. Res..

[14]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[15]  Sridhar Mahadevan,et al.  Deep Reinforcement Learning With Macro-Actions , 2016, ArXiv.

[16]  Peter Stone,et al.  Deep Reinforcement Learning in Parameterized Action Space , 2015, ICLR.

[17]  Alex Graves,et al.  Strategic Attentive Writer for Learning Macro-Actions , 2016, NIPS.

[18]  Alex Graves,et al.  Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[19]  Matthew Hausknecht and Peter Stone,et al.  Half Field Offense: An Environment for Multiagent Learning and Ad Hoc Teamwork , 2016 .

[20]  Tom Schaul,et al.  Prioritized Experience Replay , 2015, ICLR.

[21]  Harsh Satija,et al.  Simultaneous Machine Translation using Deep Reinforcement Learning , 2016 .

[22]  Balaraman Ravindran,et al.  Dynamic Frame skip Deep Q Network , 2016, ArXiv.

[23]  Graham Neubig,et al.  Learning to Translate in Real-time with Neural Machine Translation , 2016, EACL.