T HE R OLE OF P RETRAINED R EPRESENTATIONS FOR THE OOD G ENERALIZATION OF RL A GENTS

Building sample-efficient agents that generalize out-of-distribution (OOD) in real-world settings remains a fundamental unsolved problem on the path towards achiev-ing higher-level cognition. One particularly promising approach is to begin with low-dimensional, pretrained representations of our world, which should facilitate efficient downstream learning and generalization. By training 240 representations and over 10,000 reinforcement learning (RL) policies on a simulated robotic setup, we evaluate to what extent different properties of pretrained VAE-based representations affect the OOD generalization of downstream agents. We observe that many agents are surprisingly robust to realistic distribution shifts, including the challenging sim-to-real case. In addition, we find that the generalization performance of a simple downstream proxy task reliably predicts the generalization performance of our RL agents under a wide range of OOD settings. Such proxy tasks can thus be used to select pretrained representations that will lead to agents that generalize.

[1]  Bernhard Schölkopf,et al.  Boxhead: A Dataset for Learning Hierarchical Representations , 2021, ArXiv.

[2]  Sergey Levine,et al.  End-to-End Robotic Reinforcement Learning without Reward Engineering , 2019, Robotics: Science and Systems.

[3]  Michele De Vita,et al.  Generalization and Robustness Implications in Object-Centric Learning , 2021, ICML.

[4]  Andriy Mnih,et al.  Disentangling by Factorising , 2018, ICML.

[5]  Alexander D'Amour,et al.  On Robustness and Transferability of Convolutional Neural Networks , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Cuiling Lan,et al.  Generalizing to Unseen Domains: A Survey on Domain Generalization , 2021, IEEE Transactions on Knowledge and Data Engineering.

[7]  Victor Talpaert,et al.  Deep Reinforcement Learning for Autonomous Driving: A Survey , 2020, IEEE Transactions on Intelligent Transportation Systems.

[8]  Jan Peters,et al.  Stable reinforcement learning with autoencoders for tactile and visual data , 2016, 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[9]  S. Levine,et al.  Learning Invariant Representations for Reinforcement Learning without Reconstruction , 2020, ICLR.

[10]  Pieter Abbeel,et al.  Decoupling Representation Learning from Reinforcement Learning , 2020, ICML.

[11]  Pascal Vincent,et al.  Representation Learning: A Review and New Perspectives , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  B. Schölkopf,et al.  On Disentangled Representations Learned from Correlated Data , 2020, ICML.

[13]  Darwin G. Caldwell,et al.  Reinforcement Learning in Robotics: Applications and Real-World Challenges , 2013, Robotics.

[14]  Sergey Levine,et al.  Model-Based Reinforcement Learning for Atari , 2019, ICLR.

[15]  Yoshua Bengio,et al.  CausalWorld: A Robotic Manipulation Benchmark for Causal Structure and Transfer Learning , 2020, ICLR.

[16]  Abhishek Kumar,et al.  Variational Inference of Disentangled Latent Concepts from Unlabeled Observations , 2017, ICLR.

[17]  Elizabeth S. Spelke,et al.  Principles of Object Perception , 1990, Cogn. Sci..

[18]  Yoshua Bengio,et al.  Unsupervised State Representation Learning in Atari , 2019, NeurIPS.

[19]  Sergey Levine,et al.  Learning to Walk via Deep Reinforcement Learning , 2018, Robotics: Science and Systems.

[20]  R. Weale Vision. A Computational Investigation Into the Human Representation and Processing of Visual Information. David Marr , 1983 .

[21]  Marcin Andrychowicz,et al.  Solving Rubik's Cube with a Robot Hand , 2019, ArXiv.

[22]  Ben Poole,et al.  Weakly-Supervised Disentanglement Without Compromises , 2020, ICML.

[23]  Wojciech M. Czarnecki,et al.  Grandmaster level in StarCraft II using multi-agent reinforcement learning , 2019, Nature.

[24]  Stefan Bauer,et al.  Learning stable and predictive structures in kinetic systems , 2018, Proceedings of the National Academy of Sciences.

[25]  Sergey Levine,et al.  Visual Reinforcement Learning with Imagined Goals , 2018, NeurIPS.

[26]  Samy Bengio,et al.  A Study on Overfitting in Deep Reinforcement Learning , 2018, ArXiv.

[27]  Pang Wei Koh,et al.  WILDS: A Benchmark of in-the-Wild Distribution Shifts , 2020, ICML.

[28]  Sergey Levine,et al.  Generalizing Skills with Semi-Supervised Reinforcement Learning , 2016, ICLR.

[29]  Daan Wierstra,et al.  Stochastic Backpropagation and Approximate Inference in Deep Generative Models , 2014, ICML.

[30]  Herke van Hoof,et al.  Addressing Function Approximation Error in Actor-Critic Methods , 2018, ICML.

[31]  B. Schölkopf,et al.  On the Transfer of Disentangled Representations in Realistic Settings , 2020, ICLR.

[32]  Razvan Pascanu,et al.  Sim-to-Real Robot Learning from Pixels with Progressive Nets , 2016, CoRL.

[33]  Thomas G. Dietterich,et al.  Benchmarking Neural Network Robustness to Common Corruptions and Perturbations , 2018, ICLR.

[34]  Kavosh Asadi,et al.  Learning State Abstractions for Transfer in Continuous Control , 2020, ArXiv.

[35]  Sjoerd van Steenkiste,et al.  Are Disentangled Representations Helpful for Abstract Visual Reasoning? , 2019, NeurIPS.

[36]  Wojciech Zaremba,et al.  Domain randomization for transferring deep neural networks from simulation to the real world , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[37]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[38]  Boris Katz,et al.  ObjectNet: A large-scale bias-controlled dataset for pushing the limits of object recognition models , 2019, NeurIPS.

[39]  Sergey Levine,et al.  Trust Region Policy Optimization , 2015, ICML.

[40]  Guy Lever,et al.  Deterministic Policy Gradient Algorithms , 2014, ICML.

[41]  Joshua B. Tenenbaum,et al.  Deep Convolutional Inverse Graphics Network , 2015, NIPS.

[42]  David Lopez-Paz,et al.  In Search of Lost Domain Generalization , 2020, ICLR.

[43]  Roger B. Grosse,et al.  Isolating Sources of Disentanglement in Variational Autoencoders , 2018, NeurIPS.

[44]  Joshua B. Tenenbaum,et al.  Building machines that learn and think like people , 2016, Behavioral and Brain Sciences.

[45]  Siddhartha S. Srinivasa,et al.  Benchmarking Structured Policies and Policy Optimization for Real-World Dexterous Object Manipulation , 2021, IEEE Robotics and Automation Letters.

[46]  Pieter Abbeel,et al.  CURL: Contrastive Unsupervised Representations for Reinforcement Learning , 2020, ICML.

[47]  Sergey Levine,et al.  Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.

[48]  Gabriel Dulac-Arnold,et al.  Challenges of Real-World Reinforcement Learning , 2019, ArXiv.

[49]  Michael C. Mozer,et al.  Learning Deep Disentangled Embeddings with the F-Statistic Loss , 2018, NeurIPS.

[50]  Yoshua Bengio,et al.  Towards Causal Representation Learning , 2021, ArXiv.

[51]  Guillaume Desjardins,et al.  Understanding disentangling in $\beta$-VAE , 2018, 1804.03599.

[52]  Yishay Mansour,et al.  Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.

[53]  Christopher Burgess,et al.  DARLA: Improving Zero-Shot Transfer in Reinforcement Learning , 2017, ICML.

[54]  Stefan Bauer,et al.  On the Transfer of Inductive Bias from Simulation to the Real World: a New Disentanglement Dataset , 2019, NeurIPS.

[55]  Dana H. Brooks,et al.  Structured Disentangled Representations , 2018, AISTATS.

[56]  Frederik K. Drachmann,et al.  Planning From Pixels in Atari With Learned Symbolic Representations , 2020, AAAI.

[57]  Christopher Burgess,et al.  beta-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework , 2016, ICLR 2016.

[58]  Saumik Bhattacharya,et al.  Effects of Degradations on Deep Neural Network Architectures , 2018, ArXiv.

[59]  Yair Weiss,et al.  Why do deep convolutional networks generalize so poorly to small image transformations? , 2018, J. Mach. Learn. Res..

[60]  Tim Verbelen,et al.  Improving Generalization for Abstract Reasoning Tasks Using Disentangled Feature Representations , 2018, NIPS 2018.

[61]  Stefan Bauer,et al.  On the Fairness of Disentangled Representations , 2019, NeurIPS.

[62]  Danilo Jimenez Rezende,et al.  Systematic Evaluation of Causal Discovery in Visual Model Based Reinforcement Learning , 2021, NeurIPS Datasets and Benchmarks.

[63]  Ankush Gupta,et al.  Representation Matters: Improving Perception and Exploration for Robotics , 2020, 2021 IEEE International Conference on Robotics and Automation (ICRA).

[64]  Martin A. Riedmiller,et al.  Embed to Control: A Locally Linear Latent Dynamics Model for Control from Raw Images , 2015, NIPS.

[65]  Ludovic Righetti,et al.  TriFinger: An Open-Source Robot for Learning Dexterity , 2020, CoRL.

[66]  Sergey Levine,et al.  Sim-To-Real via Sim-To-Sim: Data-Efficient Robotic Grasping via Randomized-To-Canonical Adaptation Networks , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[67]  Christopher K. I. Williams,et al.  A Framework for the Quantitative Evaluation of Disentangled Representations , 2018, ICLR.

[68]  Sergey Levine,et al.  Deep spatial autoencoders for visuomotor learning , 2015, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[69]  Jürgen Schmidhuber,et al.  Recurrent World Models Facilitate Policy Evolution , 2018, NeurIPS.

[70]  Sergey Levine,et al.  High-Dimensional Continuous Control Using Generalized Advantage Estimation , 2015, ICLR.

[71]  Eugene Kharitonov,et al.  Compositionality and Generalization In Emergent Languages , 2020, ACL.

[72]  Tom Schaul,et al.  Successor Features for Transfer in Reinforcement Learning , 2016, NIPS.

[73]  Danica Kragic,et al.  Deep predictive policy training using reinforcement learning , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[74]  Alec Radford,et al.  Proximal Policy Optimization Algorithms , 2017, ArXiv.

[75]  Koray Kavukcuoglu,et al.  Neural scene representation and rendering , 2018, Science.

[76]  Taehoon Kim,et al.  Quantifying Generalization in Reinforcement Learning , 2018, ICML.

[77]  Alexander S. Ecker,et al.  Benchmarking Robustness in Object Detection: Autonomous Driving when Winter is Coming , 2019, ArXiv.

[78]  David E. Irwin,et al.  What’s in an object file? Evidence from priming studies , 1996, Perception & psychophysics.