Bootstrapping a DQN Replay Memory with Synthetic Experiences

An important component of many Deep Reinforcement Learning algorithms is the Experience Replay which serves as a storage mechanism or memory of made experiences. These experiences are used for training and help the agent to stably find the perfect trajectory through the problem space. The classic Experience Replay however makes only use of the experiences it actually made, but the stored samples bear great potential in form of knowledge about the problem that can be extracted. We present an algorithm that creates synthetic experiences in a nondeterministic discrete environment to assist the learner. The Interpolated Experience Replay is evaluated on the FrozenLake environment and we show that it can support the agent to learn faster and even better than the classic version.

[1]  J. O’Neill,et al.  Play it again: reactivation of waking experience and memory , 2010, Trends in Neurosciences.

[2]  Long-Ji Lin,et al.  Reinforcement learning for robots using neural networks , 1992 .

[3]  John N. Tsitsiklis,et al.  Analysis of temporal-difference learning with function approximation , 1996, NIPS 1996.

[4]  Kao-Shing Hwang,et al.  An Experience Replay Method Based on Tree Structure for Reinforcement Learning , 2021, IEEE Transactions on Emerging Topics in Computing.

[5]  Tom Schaul,et al.  Prioritized Experience Replay , 2015, ICLR.

[6]  Jörg Hähner,et al.  Augmenting the Algorithmic Structure of XCS by Means of Interpolation , 2016, ARCS.

[7]  Richard S. Sutton,et al.  A Deeper Look at Experience Replay , 2017, ArXiv.

[8]  Jörg Hähner,et al.  What about interpolation?: a radial basis function approach to classifier prediction modeling in XCSF , 2018, GECCO.

[9]  Long Ji Lin,et al.  Self-improving reactive agents based on reinforcement learning, planning and teaching , 1992, Machine Learning.

[10]  Jon Louis Bentley,et al.  An Algorithm for Finding Best Matches in Logarithmic Expected Time , 1977, TOMS.

[11]  Richard E. Turner,et al.  Interpolated Policy Gradient: Merging On-Policy and Off-Policy Gradient Estimation for Deep Reinforcement Learning , 2017, NIPS.

[12]  James L. McClelland,et al.  Why there are complementary learning systems in the hippocampus and neocortex: insights from the successes and failures of connectionist models of learning and memory. , 1995, Psychological review.

[13]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[14]  Karl Tuyls,et al.  The importance of experience replay database composition in deep reinforcement learning , 2015 .

[15]  Marcin Andrychowicz,et al.  Hindsight Experience Replay , 2017, NIPS.

[16]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[17]  Jörg Hähner,et al.  Interpolation in the eXtended Classifier System: An architectural perspective , 2017, J. Syst. Archit..