Robust Reinforcement Learning under model misspecification

Reinforcement learning has achieved remarkable performance in a wide range of tasks these days. Nevertheless, some unsolved problems limit its applications in real-world control. One of them is model misspecification, a situation where an agent is trained and deployed in environments with different transition dynamics. We propose an novel framework that utilize history trajectory and Partial Observable Markov Decision Process Modeling to deal with this dilemma. Additionally, we put forward an efficient adversarial attack method to assist robust training. Our experiments in four gym domains validate the effectiveness of our framework.

[1]  Shie Mannor,et al.  Action Robust Reinforcement Learning and Applications in Continuous Control , 2019, ICML.

[2]  Elena Smirnova,et al.  Distributionally Robust Reinforcement Learning , 2019, ArXiv.

[3]  Wojciech Zaremba,et al.  OpenAI Gym , 2016, ArXiv.

[4]  David Silver,et al.  Memory-based control with recurrent neural networks , 2015, ArXiv.

[5]  J. Andrew Bagnell,et al.  Modeling Purposeful Adaptive Behavior with the Principle of Maximum Causal Entropy , 2010 .

[6]  Howie Choset,et al.  Adversary A3C for Robust Reinforcement Learning , 2019, ArXiv.

[7]  Yang Gao,et al.  Risk Averse Robust Adversarial Reinforcement Learning , 2019, 2019 International Conference on Robotics and Automation (ICRA).

[8]  Abhinav Gupta,et al.  Robust Adversarial Reinforcement Learning , 2017, ICML.

[9]  Sergey Levine,et al.  Deep visual foresight for planning robot motion , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[10]  Alex Graves,et al.  Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[11]  Sergey Levine,et al.  Learning to Adapt in Dynamic, Real-World Environments through Meta-Reinforcement Learning , 2018, ICLR.

[12]  Shie Mannor,et al.  Learning Robust Options , 2018, AAAI.

[13]  Sergey Levine,et al.  Deep reinforcement learning for robotic manipulation with asynchronous off-policy updates , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[14]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[15]  Martin A. Riedmiller,et al.  Robust Reinforcement Learning for Continuous Control with Model Misspecification , 2019, ICLR.

[16]  Yuan Shen,et al.  Autonomous Navigation of UAVs in Large-Scale Complex Environments: A Deep Reinforcement Learning Approach , 2019, IEEE Transactions on Vehicular Technology.

[17]  Girish Chowdhary,et al.  Robust Deep Reinforcement Learning with Adversarial Attacks , 2017, AAMAS.

[18]  Sandy H. Huang,et al.  Adversarial Attacks on Neural Network Policies , 2017, ICLR.

[19]  Garud Iyengar,et al.  Robust Dynamic Programming , 2005, Math. Oper. Res..

[20]  Sergey Levine,et al.  Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.

[21]  Sergey Levine,et al.  Reinforcement Learning with Deep Energy-Based Policies , 2017, ICML.

[22]  Aurko Roy,et al.  Reinforcement Learning under Model Mismatch , 2017, NIPS.

[23]  Marcin Andrychowicz,et al.  Sim-to-Real Transfer of Robotic Control with Dynamics Randomization , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[24]  Soumik Sarkar,et al.  Online Robust Policy Learning in the Presence of Unknown Adversaries , 2018, NeurIPS.

[25]  Swarat Chaudhuri,et al.  Control Regularization for Reduced Variance Reinforcement Learning , 2019, ICML.

[26]  Yoshua Bengio,et al.  Scaling learning algorithms towards AI , 2007 .

[27]  Gabriel Dulac-Arnold,et al.  Challenges of Real-World Reinforcement Learning , 2019, ArXiv.

[28]  Silvio Savarese,et al.  Adversarially Robust Policy Learning: Active construction of physically-plausible perturbations , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[29]  Wojciech M. Czarnecki,et al.  Grandmaster level in StarCraft II using multi-agent reinforcement learning , 2019, Nature.

[30]  Karl Johan Åström,et al.  Optimal control of Markov processes with incomplete state information , 1965 .

[31]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[32]  Dawn Xiaodong Song,et al.  Delving into adversarial attacks on deep policies , 2017, ICLR.