Policy Augmentation: An Exploration Strategy For Faster Convergence of Deep Reinforcement Learning Algorithms

Despite advancements in deep reinforcement learning algorithms, developing an effective exploration strategy is still an open problem. Most existing exploration strategies either are based on simple heuristics, or require the model of the environment, or train additional deep neural networks to generate imagination-augmented paths. In this paper, a revolutionary algorithm, called Policy Augmentation, is introduced. Policy Augmentation is based on a newly developed inductive matrix completion method. The proposed algorithm augments the values of unexplored state-action pairs, helping the agent take actions that will result in high-value returns while the agent is in the early episodes. Training deep reinforcement learning algorithms with high-value rollouts leads to the faster convergence of deep reinforcement learning algorithms. Our experiments show the superior performance of Policy Augmentation. The code can be found at: https://github.com/arashmahyari/PolicyAugmentation.

[1]  Wojciech Zaremba,et al.  OpenAI Gym , 2016, ArXiv.

[2]  Wolfram Burgard,et al.  Curiosity-driven Exploration for Mapless Navigation with Deep Reinforcement Learning , 2018, ArXiv.

[3]  Demis Hassabis,et al.  Mastering the game of Go without human knowledge , 2017, Nature.

[4]  Namrata Vaswani,et al.  Provable Subspace Tracking From Missing Data and Matrix Completion , 2018, IEEE Transactions on Signal Processing.

[5]  Patrik Reizinger,et al.  Attention-Based Curiosity-Driven Exploration in Deep Reinforcement Learning , 2019, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[6]  Ashis Kumer Biswas,et al.  Robust Inductive Matrix Completion strategy to explore associations between lincRNAs and human disease phenotypes , 2016, 2016 IEEE International Conference on Bioinformatics and Biomedicine (BIBM).

[7]  Alex Graves,et al.  Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[8]  Alex Graves,et al.  Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[9]  Andrea Montanari,et al.  Matrix completion from a few entries , 2009, 2009 IEEE International Symposium on Information Theory.

[10]  Doina Precup,et al.  Off-Policy Deep Reinforcement Learning without Exploration , 2018, ICML.

[11]  Inderjit S. Dhillon,et al.  Provable Inductive Matrix Completion , 2013, ArXiv.

[12]  Razvan Pascanu,et al.  Imagination-Augmented Agents for Deep Reinforcement Learning , 2017, NIPS.

[13]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[14]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[15]  Filip De Turck,et al.  #Exploration: A Study of Count-Based Exploration for Deep Reinforcement Learning , 2016, NIPS.

[16]  Andrew W. Moore,et al.  Efficient memory-based learning for robot control , 1990 .

[17]  Sergey Levine,et al.  Trust Region Policy Optimization , 2015, ICML.

[18]  Nuno Vasconcelos,et al.  Geodesic Regression on the Grassmannian , 2014, ECCV.

[19]  Alec Radford,et al.  Proximal Policy Optimization Algorithms , 2017, ArXiv.

[20]  Olgica Milenkovic,et al.  Subspace Evolution and Transfer (SET) for Low-Rank Matrix Completion , 2010, IEEE Transactions on Signal Processing.

[21]  Pascal Frossard,et al.  Clustering on Multi-Layer Graphs via Subspace Analysis on Grassmann Manifolds , 2013, IEEE Transactions on Signal Processing.