Synthetic Experiences for Accelerating DQN Performance in Discrete Non-Deterministic Environments

State-of-the-art Deep Reinforcement Learning Algorithms such as DQN and DDPG use the concept of a replay buffer called Experience Replay. The default usage contains only the experiences that have been gathered over the runtime. We propose a method called Interpolated Experience Replay that uses stored (real) transitions to create synthetic ones to assist the learner. In this first approach to this field, we limit ourselves to discrete and non-deterministic environments and use a simple equally weighted average of the reward in combination with observed follow-up states. We could demonstrate a significantly improved overall mean average in comparison to a DQN network with vanilla Experience Replay on the discrete and non-deterministic FrozenLake8x8-v0 environment.

[1]  J. O’Neill,et al.  Play it again: reactivation of waking experience and memory , 2010, Trends in Neurosciences.

[2]  Dana H. Ballard,et al.  Learning to perceive and act by trial and error , 1991, Machine Learning.

[3]  Richard E. Turner,et al.  Interpolated Policy Gradient: Merging On-Policy and Off-Policy Gradient Estimation for Deep Reinforcement Learning , 2017, NIPS.

[4]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[5]  Long-Ji Lin,et al.  Reinforcement learning for robots using neural networks , 1992 .

[6]  Long Ji Lin,et al.  Self-improving reactive agents based on reinforcement learning, planning and teaching , 1992, Machine Learning.

[7]  Tom Schaul,et al.  Prioritized Experience Replay , 2015, ICLR.

[8]  Kao-Shing Hwang,et al.  An Experience Replay Method Based on Tree Structure for Reinforcement Learning , 2021, IEEE Transactions on Emerging Topics in Computing.

[9]  Richard S. Sutton,et al.  A Deeper Look at Experience Replay , 2017, ArXiv.

[10]  John N. Tsitsiklis,et al.  Analysis of temporal-difference learning with function approximation , 1996, NIPS 1996.

[11]  Marcin Andrychowicz,et al.  Hindsight Experience Replay , 2017, NIPS.

[12]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[13]  Karl Tuyls,et al.  The importance of experience replay database composition in deep reinforcement learning , 2015 .

[14]  James L. McClelland,et al.  Why there are complementary learning systems in the hippocampus and neocortex: insights from the successes and failures of connectionist models of learning and memory. , 1995, Psychological review.

[15]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[16]  Robert Babuska,et al.  Improved deep reinforcement learning for robotics through distribution-based experience retention , 2016, 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[17]  Jörg Hähner,et al.  Augmenting the Algorithmic Structure of XCS by Means of Interpolation , 2016, ARCS.

[18]  Jörg Hähner,et al.  Bootstrapping a DQN Replay Memory with Synthetic Experiences , 2020, IJCCI.