Towards TempoRL: Learning When to Act

Reinforcement Learning is a powerful approach to learning behaviour through interactions with an environment. However, behaviours are learned in a purely reactive fashion, where an appropriate action is selected based on an observation. In this form, it is challenging to learn when it is necessary to make new decisions. This makes learning inefficient especially in environments with with very fine-grained time steps. Instead we propose a more proactive setting in which not only an action is chosen in a state but also for how long to commit to that action. We demonstrate the effectiveness of our proposed approach on a set of small grid worlds, showing that our approach is capable of learning successful policies much faster than vanilla Q-learning.

[1]  L. C. Baird,et al.  Reinforcement learning in continuous time: advantage updating , 1994, Proceedings of 1994 IEEE International Conference on Neural Networks (ICNN'94).

[2]  Doina Precup,et al.  Theoretical Results on Reinforcement Learning with Temporally Abstract Options , 1998, ECML.

[3]  Doina Precup,et al.  Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[4]  Kenji Doya,et al.  Reinforcement Learning in Continuous Time and Space , 2000, Neural Computation.

[5]  Doina Precup,et al.  Learning Options in Reinforcement Learning , 2002, SARA.

[6]  Tom Schaul,et al.  Universal Value Function Approximators , 2015, ICML.

[7]  Elliot Meyerson,et al.  Frame Skip Is a Powerful Parameter for Learning to Play Atari , 2015, AAAI Workshop: Learning for General Competency in Video Games.

[8]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[9]  Marc G. Bellemare,et al.  The Arcade Learning Environment: An Evaluation Platform for General Agents , 2012, J. Artif. Intell. Res..

[10]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[11]  Marc W. Howard,et al.  Scale Invariant Value Computation for Reinforcement Learning in Continuous Time , 2017, AAAI Spring Symposia.

[12]  Doina Precup,et al.  The Option-Critic Architecture , 2016, AAAI.

[13]  Balaraman Ravindran,et al.  Learning to Repeat: Fine Grained Action Repetition for Deep Reinforcement Learning , 2017, ICLR.

[14]  Balaraman Ravindran,et al.  Dynamic Action Repetition for Deep Reinforcement Learning , 2017, AAAI.

[15]  Shie Mannor,et al.  Learning Robust Options , 2018, AAAI.

[16]  Doina Precup,et al.  When Waiting is not an Option : Learning Options with a Deliberation Cost , 2017, AAAI.

[17]  Doina Precup,et al.  Learning with Options that Terminate Off-Policy , 2017, AAAI.

[18]  Feng Jiang,et al.  Optimal Skipping Rates: Training Agents with Fine-Grained Control Using Deep Reinforcement Learning , 2019, J. Robotics.

[19]  Doina Precup,et al.  Learning Options with Interest Functions , 2019, AAAI.

[20]  Quanyan Zhu,et al.  Continuous-Time Markov Decision Processes with Controlled Observations , 2019, 2019 57th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[21]  Igor Mordatch,et al.  Emergent Tool Use From Multi-Agent Autocurricula , 2019, ICLR.