Regioned Episodic Reinforcement Learning

Goal-oriented reinforcement learning algorithms are often good at exploration, not exploitation, while episodic algorithms excel at exploitation, not exploration. As a result, neither of these approaches alone can lead to a sample efficient algorithm in complex environments with high dimensional state space and delayed rewards. Motivated by these observations and shortcomings, in this paper, we introduce Regioned Episodic Reinforcement Learning (RERL) that combines the episodic and goal-oriented learning strengths and leads to a more sample efficient and ef- fective algorithm. RERL achieves this by decomposing the space into several sub-space regions and constructing regions that lead to more effective exploration and high values trajectories. Extensive experiments on various benchmark tasks show that RERL outperforms existing methods in terms of sample efficiency and final rewards.

[1]  Marlos C. Machado,et al.  Exploration in Reinforcement Learning with Deep Covering Options , 2020, ICLR.

[2]  Sergey Levine,et al.  Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.

[3]  Sergey Levine,et al.  Trust Region Policy Optimization , 2015, ICML.

[4]  Kavosh Asadi,et al.  Lipschitz Continuity in Model-based Reinforcement Learning , 2018, ICML.

[5]  Sae-Young Chung,et al.  Sample-Efficient Deep Reinforcement Learning via Episodic Backward Update , 2018, NeurIPS.

[6]  Pieter Abbeel,et al.  Automatic Goal Generation for Reinforcement Learning Agents , 2017, ICML.

[7]  Tom Schaul,et al.  FeUdal Networks for Hierarchical Reinforcement Learning , 2017, ICML.

[8]  Sergey Levine,et al.  Rewriting History with Inverse RL: Hindsight Inference for Policy Improvement , 2020, NeurIPS.

[9]  Marlos C. Machado,et al.  Eigenoption Discovery through the Deep Successor Representation , 2017, ICLR.

[10]  Richard Socher,et al.  Keeping Your Distance: Solving Sparse Reward Tasks Using Self-Balancing Shaped Rewards , 2019, NeurIPS.

[11]  Sergey Levine,et al.  Data-Efficient Hierarchical Reinforcement Learning , 2018, NeurIPS.

[12]  Peter Dayan,et al.  Hippocampal Contributions to Control: The Third Way , 2007, NIPS.

[13]  Daphna Weinshall,et al.  On The Power of Curriculum Learning in Training Deep Networks , 2019, ICML.

[14]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[15]  Ion Stoica,et al.  Multi-Level Discovery of Deep Options , 2017, ArXiv.

[16]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[17]  Alec Radford,et al.  Proximal Policy Optimization Algorithms , 2017, ArXiv.

[18]  Ramona O Hopkins,et al.  Semantic Memory and the Human Hippocampus , 2003, Neuron.

[19]  Chris Drummond,et al.  Accelerating Reinforcement Learning by Composing Solutions of Automatically Identified Subtasks , 2011, J. Artif. Intell. Res..

[20]  Pieter Abbeel,et al.  Meta Learning Shared Hierarchies , 2017, ICLR.

[21]  Alicia P. Wolfe,et al.  Identifying useful subgoals in reinforcement learning by local graph partitioning , 2005, ICML.

[22]  Demis Hassabis,et al.  Neural Episodic Control , 2017, ICML.

[23]  Pieter Abbeel,et al.  Benchmarking Deep Reinforcement Learning for Continuous Control , 2016, ICML.

[24]  Guangwen Yang,et al.  Episodic Memory Deep Q-Networks , 2018, IJCAI.

[25]  Matthew J. Salganik,et al.  Experimental Study of Inequality and Unpredictability in an Artificial Cultural Market , 2006, Science.

[26]  Sergey Levine,et al.  Temporal Difference Models: Model-Free Deep RL for Model-Based Control , 2018, ICLR.

[27]  Jason Weston,et al.  Curriculum learning , 2009, ICML '09.

[28]  Guangwen Yang,et al.  Episodic Reinforcement Learning with Associative Memory , 2020, ICLR.

[29]  Andrew G. Barto,et al.  Skill Discovery in Continuous Reinforcement Learning Domains using Skill Chaining , 2009, NIPS.

[30]  Joel Z. Leibo,et al.  Model-Free Episodic Control , 2016, ArXiv.

[31]  Tom Schaul,et al.  Universal Value Function Approximators , 2015, ICML.

[32]  D Marr,et al.  Simple memory: a theory for archicortex. , 1971, Philosophical transactions of the Royal Society of London. Series B, Biological sciences.

[33]  Doina Precup,et al.  The Option-Critic Architecture , 2016, AAAI.

[34]  Charles Blundell,et al.  Fast deep reinforcement learning using online adjustments from the past , 2018, NeurIPS.

[35]  Richard Socher,et al.  Learning World Graphs to Accelerate Hierarchical Reinforcement Learning , 2019, ArXiv.

[36]  George Konidaris,et al.  Option Discovery using Deep Skill Chaining , 2020, ICLR.

[37]  R. Sutherland,et al.  Configural association theory: The role of the hippocampal formation in learning, memory, and amnesia , 1989, Psychobiology.

[38]  Dimitri P. Bertsekas,et al.  Dynamic Programming and Optimal Control, Two Volume Set , 1995 .

[39]  Yifan Wu,et al.  The Laplacian in RL: Learning Representations with Efficient Approximations , 2018, ICLR.

[40]  Sergey Levine,et al.  Stabilizing Off-Policy Q-Learning via Bootstrapping Error Reduction , 2019, NeurIPS.

[41]  Yuan Zhou,et al.  Exploration via Hindsight Goal Generation , 2019, NeurIPS.

[42]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[43]  Amit K. Roy-Chowdhury,et al.  Learning from Trajectories via Subgoal Discovery , 2019, NeurIPS.

[44]  I. Gilboa,et al.  Case-Based Decision Theory , 1995 .

[45]  Marcin Andrychowicz,et al.  Hindsight Experience Replay , 2017, NIPS.

[46]  Marcin Andrychowicz,et al.  Multi-Goal Reinforcement Learning: Challenging Robotics Environments and Request for Research , 2018, ArXiv.

[47]  Daniel Guo,et al.  Agent57: Outperforming the Atari Human Benchmark , 2020, ICML.

[48]  Alex Graves,et al.  Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[49]  Marc G. Bellemare,et al.  Count-Based Exploration with Neural Density Models , 2017, ICML.

[50]  Zhen Wang,et al.  On the Effectiveness of Least Squares Generative Adversarial Networks , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.