论文信息 - SECANT: Self-Expert Cloning for Zero-Shot Generalization of Visual Policies

SECANT: Self-Expert Cloning for Zero-Shot Generalization of Visual Policies

Generalization has been a long-standing challenge for reinforcement learning (RL). Visual RL, in particular, can be easily distracted by irrelevant factors in high-dimensional observation space. In this work, we consider robust policy learning which targets zero-shot generalization to unseen visual environments with large distributional shift. We propose SECANT, a novel self-expert cloning technique that leverages image augmentation in two stages to decouple robust representation learning from policy optimization. Specifically, an expert policy is first trained by RL from scratch with weak augmentations. A student network then learns to mimic the expert policy by supervised learning with strong augmentations, making its representation more robust against visual variations compared to the expert. Extensive experiments demonstrate that SECANT significantly advances the state of the art in zero-shot generalization across 4 challenging domains. Our average reward improvements over prior SOTAs are: DeepMind Control (+26.5%), robotic manipulation (+337.8%), vision-based autonomous driving (+47.7%), and indoor object navigation (+15.8%). Code release and video are available at this link .

[1] Shimon Whiteson,et al. The Impact of Non-stationarity on Generalisation in Deep Reinforcement Learning , 2020, ArXiv.

[2] Geoffrey E. Hinton,et al. Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[3] David Berthelot,et al. MixMatch: A Holistic Approach to Semi-Supervised Learning , 2019, NeurIPS.

[4] Mohi Khansari,et al. RL-CycleGAN: Reinforcement Learning Aware Simulation-to-Real , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[5] Marlos C. Machado,et al. Generalization and Regularization in DQN , 2018, ArXiv.

[6] Ilya Kostrikov,et al. Image Augmentation Is All You Need: Regularizing Deep Reinforcement Learning from Pixels , 2020, ArXiv.

[7] Jakub W. Pachocki,et al. Learning dexterous in-hand manipulation , 2018, Int. J. Robotics Res..

[8] Rowan McAllister,et al. Learning Invariant Representations for Reinforcement Learning without Reconstruction , 2020, ICLR.

[9] Lorenz Wellhausen,et al. Learning quadrupedal locomotion over challenging terrain , 2020, Science Robotics.

[10] Alexei A. Efros,et al. Self-Supervised Policy Adaptation during Deployment , 2020, ICLR.

[11] Edward Grefenstette,et al. Prioritized Level Replay , 2020, ICML.

[12] Taehoon Kim,et al. Quantifying Generalization in Reinforcement Learning , 2018, ICML.

[13] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[14] Dmytro Mishkin,et al. Kornia: an Open Source Differentiable Computer Vision Library for PyTorch , 2019, 2020 IEEE Winter Conference on Applications of Computer Vision (WACV).

[15] Simon Haykin,et al. GradientBased Learning Applied to Document Recognition , 2001 .

[16] Joelle Pineau,et al. Improving Sample Efficiency in Model-Free Reinforcement Learning from Images , 2019, AAAI.

[17] Stefano Ermon,et al. Generative Adversarial Imitation Learning , 2016, NIPS.

[18] Jiashi Feng,et al. Improving Generalization in Reinforcement Learning with Mixture Regularization , 2020, NeurIPS.

[19] Geoffrey J. Gordon,et al. A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning , 2010, AISTATS.

[20] Marcin Andrychowicz,et al. Asymmetric Actor Critic for Image-Based Robot Learning , 2017, Robotics: Science and Systems.

[21] Vladlen Koltun,et al. Learning by Cheating , 2019, CoRL.

[22] Kibok Lee,et al. Network Randomization: A Simple Technique for Generalization in Deep Reinforcement Learning , 2020, ICLR.

[23] Abhinav Gupta,et al. Robot Learning in Homes: Improving Generalization and Reducing Dataset Bias , 2018, NeurIPS.

[24] Pieter Abbeel,et al. CURL: Contrastive Unsupervised Representations for Reinforcement Learning , 2020, ICML.

[25] Stefan Schaal,et al. Learning from Demonstration , 1996, NIPS.

[26] Yoav Goldberg,et al. Transfer Learning for Related Reinforcement Learning Tasks via Image-to-Image Translation , 2018, ICML.

[27] Nando de Freitas,et al. Playing hard exploration games by watching YouTube , 2018, NeurIPS.

[28] Balaji Lakshminarayanan,et al. AugMix: A Simple Data Processing Method to Improve Robustness and Uncertainty , 2020, ICLR.

[29] Doina Precup,et al. Bisimulation Metrics are Optimal Value Functions , 2014, UAI.

[30] VelosoManuela,et al. A survey of robot learning from demonstration , 2009 .

[31] Dawn Xiaodong Song,et al. Assessing Generalization in Deep Reinforcement Learning , 2018, ArXiv.

[32] David Berthelot,et al. FixMatch: Simplifying Semi-Supervised Learning with Consistency and Confidence , 2020, NeurIPS.

[33] Xingyou Song,et al. Observational Overfitting in Reinforcement Learning , 2019, ICLR.

[34] Marcin Andrychowicz,et al. Sim-to-Real Transfer of Robotic Control with Dynamics Randomization , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[35] Thomas G. Dietterich,et al. Benchmarking Neural Network Robustness to Common Corruptions and Surface Variations , 2018, 1807.01697.

[36] David Berthelot,et al. ReMixMatch: Semi-Supervised Learning with Distribution Alignment and Augmentation Anchoring , 2019, ArXiv.

[37] Ilya Kostrikov,et al. Automatic Data Augmentation for Generalization in Deep Reinforcement Learning , 2020, ArXiv.

[38] Geoffrey E. Hinton,et al. Visualizing Data using t-SNE , 2008 .

[39] J. Schulman,et al. Leveraging Procedural Generation to Benchmark Reinforcement Learning , 2019, ICML.

[40] Silvio Savarese,et al. Interactive Gibson Benchmark: A Benchmark for Interactive Navigation in Cluttered Environments , 2020, IEEE Robotics and Automation Letters.

[41] Yee Whye Teh,et al. Distral: Robust multitask reinforcement learning , 2017, NIPS.

[42] Richard Socher,et al. On the Generalization Gap in Reparameterizable Reinforcement Learning , 2019, ICML.

[43] Tim Rocktäschel,et al. RIDE: Rewarding Impact-Driven Exploration for Procedurally-Generated Environments , 2020, ICLR.

[44] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[45] Sam Devlin,et al. Generalization in Reinforcement Learning with Selective Noise Injection and Information Bottleneck , 2019, NeurIPS.

[46] Pieter Abbeel,et al. Decoupling Representation Learning from Reinforcement Learning , 2020, ICML.

[47] Silvio Savarese,et al. iGibson 1.0: A Simulation Environment for Interactive Tasks in Large Realistic Scenes , 2020, 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[48] Nitish Srivastava,et al. Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[49] Joelle Pineau,et al. A Dissection of Overfitting and Generalization in Continuous Reinforcement Learning , 2018, ArXiv.

[50] Wojciech Zaremba,et al. Domain randomization for transferring deep neural networks from simulation to the real world , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[51] Zeb Kurth-Nelson,et al. Learning to reinforcement learn , 2016, CogSci.

[52] Julian Togelius,et al. Illuminating Generalization in Deep Reinforcement Learning through Procedural Level Generation , 2018, 1806.10729.

[53] Razvan Pascanu,et al. Distilling Policy Distillation , 2019, AISTATS.

[54] Sergey Levine,et al. Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.

[55] Jonathan Dodge,et al. Visualizing and Understanding Atari Agents , 2017, ICML.

[56] Henry Zhu,et al. Soft Actor-Critic Algorithms and Applications , 2018, ArXiv.

[57] Hongyi Zhang,et al. mixup: Beyond Empirical Risk Minimization , 2017, ICLR.

[58] Seong Joon Oh,et al. CutMix: Regularization Strategy to Train Strong Classifiers With Localizable Features , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[59] Pieter Abbeel,et al. Reinforcement Learning with Augmented Data , 2020, NeurIPS.

[60] Sergey Levine,et al. End-to-End Training of Deep Visuomotor Policies , 2015, J. Mach. Learn. Res..

[61] Alexei A. Efros,et al. Curiosity-Driven Exploration by Self-Supervised Prediction , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[62] Chong Li,et al. Multi-task Learning for Continuous Control , 2018, ArXiv.

[63] Hongyuan Zha,et al. Single Episode Policy Transfer in Reinforcement Learning , 2019, ICLR.

[64] Philipp Krahenbuhl,et al. Domain Adaptation Through Task Distillation , 2020, ECCV.

[65] Xiaolong Wang,et al. Generalization in Reinforcement Learning by Soft Data Augmentation , 2020, 2021 IEEE International Conference on Robotics and Automation (ICRA).

[66] Marlos C. Machado,et al. Revisiting the Arcade Learning Environment: Evaluation Protocols and Open Problems for General Agents , 2017, J. Artif. Intell. Res..

[67] Yuval Tassa,et al. DeepMind Control Suite , 2018, ArXiv.

[68] Sham M. Kakade,et al. Towards Generalization and Simplicity in Continuous Control , 2017, NIPS.

[69] Roberto Mart'in-Mart'in,et al. robosuite: A Modular Simulation Framework and Benchmark for Robot Learning , 2020, ArXiv.