Continuous Transition: Improving Sample Efficiency for Continuous Control Problems via MixUp

Although deep reinforcement learning (RL) has been successfully applied to a variety of robotic control tasks, it’s still challenging to apply it to real-world tasks, due to the poor sample efficiency. Attempting to overcome this shortcoming, several works focus on reusing the collected trajectory data during the training by decomposing them into a set of policy-irrelevant discrete transitions. However, their improvements are somewhat marginal since i) the amount of the transitions is usually small, and ii) the value assignment only happens in the joint states. To address these issues, this paper introduces a concise yet powerful method to construct Continuous Transition, which exploits the trajectory information by exploiting the potential transitions along the trajectory. Specifically, we propose to synthesize new transitions for training by linearly interpolating the consecutive transitions. To keep the constructed transitions authentic, we also develop a discriminator to guide the construction process automatically. Extensive experiments demonstrate that our proposed method achieves a significant improvement in sample efficiency on various complex continuous robotic control problems in MuJoCo and outperforms the advanced model-based / model-free RL methods. The source code is available1.

[1]  Hongyi Zhang,et al.  mixup: Beyond Empirical Risk Minimization , 2017, ICLR.

[2]  Sergey Levine,et al.  QT-Opt: Scalable Deep Reinforcement Learning for Vision-Based Robotic Manipulation , 2018, CoRL.

[3]  Sergey Levine,et al.  Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.

[4]  Sergey Levine,et al.  Trust Region Policy Optimization , 2015, ICML.

[5]  Yuval Tassa,et al.  Simulation tools for model-based robotics: Comparison of Bullet, Havok, MuJoCo, ODE and PhysX , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[6]  Demis Hassabis,et al.  Mastering the game of Go without human knowledge , 2017, Nature.

[7]  Marcin Andrychowicz,et al.  Parameter Space Noise for Exploration , 2017, ICLR.

[8]  Sergey Levine,et al.  Robustness to Out-of-Distribution Inputs via Task-Aware Generative Uncertainty , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[9]  MODEL-ENSEMBLE TRUST-REGION POLICY OPTI- , 2017 .

[10]  Pieter Abbeel,et al.  Stochastic Neural Networks for Hierarchical Reinforcement Learning , 2016, ICLR.

[11]  Yann LeCun,et al.  Energy-based Generative Adversarial Network , 2016, ICLR.

[12]  Henry Zhu,et al.  Soft Actor-Critic Algorithms and Applications , 2018, ArXiv.

[13]  Pieter Abbeel,et al.  Reinforcement Learning with Augmented Data , 2020, NeurIPS.

[14]  Luís Paulo Reis,et al.  Learning to Run Faster in a Humanoid Robot Soccer Environment Through Reinforcement Learning , 2019, RoboCup.

[15]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[16]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[17]  Sergey Levine,et al.  Model-Based Reinforcement Learning for Atari , 2019, ICLR.

[18]  Alec Radford,et al.  Proximal Policy Optimization Algorithms , 2017, ArXiv.

[19]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[20]  Ilya Kostrikov,et al.  Image Augmentation Is All You Need: Regularizing Deep Reinforcement Learning from Pixels , 2020, ArXiv.

[21]  Erik Derner,et al.  Vision-based Navigation Using Deep Reinforcement Learning , 2019, 2019 European Conference on Mobile Robots (ECMR).

[22]  Alexei A. Efros,et al.  Curiosity-Driven Exploration by Self-Supervised Prediction , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[23]  Sergey Levine,et al.  Neural Network Dynamics for Model-Based Deep Reinforcement Learning with Model-Free Fine-Tuning , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[24]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[25]  Samy Bengio,et al.  Understanding deep learning requires rethinking generalization , 2016, ICLR.

[26]  Liang Gong,et al.  Exploration-efficient Deep Reinforcement Learning with Demonstration Guidance for Robot Control , 2020, ArXiv.

[27]  Pieter Abbeel,et al.  Benchmarking Deep Reinforcement Learning for Continuous Control , 2016, ICML.

[28]  Richard L. Lewis,et al.  Intrinsically Motivated Reinforcement Learning: An Evolutionary Perspective , 2010, IEEE Transactions on Autonomous Mental Development.

[29]  Pieter Abbeel,et al.  CURL: Contrastive Unsupervised Representations for Reinforcement Learning , 2020, ICML.

[30]  Wolfram Burgard,et al.  Socially compliant mobile robot navigation via inverse reinforcement learning , 2016, Int. J. Robotics Res..

[31]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[32]  Pieter Abbeel,et al.  SUNRISE: A Simple Unified Framework for Ensemble Learning in Deep Reinforcement Learning , 2021, ICML.

[33]  Demis Hassabis,et al.  A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play , 2018, Science.

[34]  Davide Scaramuzza,et al.  A General Framework for Uncertainty Estimation in Deep Learning , 2020, IEEE Robotics and Automation Letters.

[35]  Vijay Kumar,et al.  Graph Policy Gradients for Large Scale Robot Control , 2019, CoRL.

[36]  Herke van Hoof,et al.  Addressing Function Approximation Error in Actor-Critic Methods , 2018, ICML.

[37]  Pieter Abbeel,et al.  Value Iteration Networks , 2016, NIPS.

[38]  Sergey Levine,et al.  Temporal Difference Models: Model-Free Deep RL for Model-Based Control , 2018, ICLR.

[39]  Geoffrey E. Hinton,et al.  Speech recognition with deep recurrent neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[40]  Wojciech Zaremba,et al.  OpenAI Gym , 2016, ArXiv.

[41]  Ieee Staff 2013 European Conference on Mobile Robots (ECMR) , 2013 .

[42]  Pieter Abbeel,et al.  Model-Ensemble Trust-Region Policy Optimization , 2018, ICLR.

[43]  Patrice Y. Simard,et al.  Best practices for convolutional neural networks applied to visual document analysis , 2003, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..

[44]  Fawzi Nashashibi,et al.  End-to-End Race Driving with Deep Reinforcement Learning , 2018, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[45]  Wojciech M. Czarnecki,et al.  Grandmaster level in StarCraft II using multi-agent reinforcement learning , 2019, Nature.

[46]  Alice M. Agogino,et al.  Reinforcement Learning on Variable Impedance Controller for High-Precision Robotic Assembly , 2019, 2019 International Conference on Robotics and Automation (ICRA).

[47]  Taehoon Kim,et al.  Quantifying Generalization in Reinforcement Learning , 2018, ICML.

[48]  Chong Wang,et al.  Deep Speech 2 : End-to-End Speech Recognition in English and Mandarin , 2015, ICML.

[49]  Jürgen Schmidhuber,et al.  Multi-column deep neural networks for image classification , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[50]  Luca Maria Gambardella,et al.  High-Performance Neural Networks for Visual Object Classification , 2011, ArXiv.

[51]  Jimmy Ba,et al.  Exploring Model-based Planning with Policy Networks , 2019, ICLR.

[52]  Sergey Levine,et al.  Deep Reinforcement Learning in a Handful of Trials using Probabilistic Dynamics Models , 2018, NeurIPS.

[53]  Hongyu Guo,et al.  MixUp as Locally Linear Out-Of-Manifold Regularization , 2018, AAAI.

[54]  Ilya Kostrikov,et al.  Automatic Data Augmentation for Generalization in Deep Reinforcement Learning , 2020, ArXiv.

[55]  Alex Graves,et al.  Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[56]  Xinyan Deng,et al.  Learn-to-Recover: Retrofitting UAVs with Reinforcement Learning-Assisted Flight Control Under Cyber-Physical Attacks , 2020, 2020 IEEE International Conference on Robotics and Automation (ICRA).

[57]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[58]  Hong Wang,et al.  Noisy Derivative-Free Optimization With Value Suppression , 2018, AAAI.