Learning Domain Invariant Representations in Goal-conditioned Block MDPs

Deep Reinforcement Learning (RL) is successful in solving many complex Markov Decision Processes (MDPs) problems. However, agents often face unanticipated environmental changes after deployment in the real world. These changes are often spurious and unrelated to the underlying problem, such as background shifts for visual input agents. Unfortunately, deep RL policies are usually sensitive to these changes and fail to act robustly against them. This resembles the problem of domain generalization in supervised learning. In this work, we study this problem for goalconditioned RL agents. We propose a theoretical framework in the Block MDP setting that characterizes the generalizability of goal-conditioned policies to new environments. Under this framework, we develop a practical method PA-SkewFit that enhances domain generalization. The empirical evaluation shows that our goal-conditioned RL agent can perform well in various unseen test environments, improving by 50% over baselines.

[1]  Ruslan Salakhutdinov,et al.  Weakly-Supervised Reinforcement Learning for Controllable Behavior , 2020, NeurIPS.

[2]  Yutaka Matsuo,et al.  Adversarial Invariant Feature Learning with Accuracy Constraint for Domain Generalization , 2019, ECML/PKDD.

[3]  Brian Okorn,et al.  ROLL: Visual Self-Supervised Reinforcement Learning with Object Reasoning , 2020, CoRL.

[4]  Sergey Levine,et al.  Near-Optimal Representation Learning for Hierarchical Reinforcement Learning , 2018, ICLR.

[5]  Sheila A. McIlraith,et al.  Planning from Pixels using Inverse Dynamics Models , 2020, ICLR.

[6]  Bradly C. Stadie,et al.  World Model as a Graph: Learning Latent Landmarks for Planning , 2021, ICML.

[7]  Marc G. Bellemare,et al.  DeepMDP: Learning Continuous Latent Space Models for Representation Learning , 2019, ICML.

[8]  Demis Hassabis,et al.  Mastering the game of Go without human knowledge , 2017, Nature.

[9]  Alexei A. Efros,et al.  Self-Supervised Policy Adaptation during Deployment , 2020, ICLR.

[10]  Doina Precup,et al.  Bisimulation Metrics for Continuous Markov Decision Processes , 2011, SIAM J. Comput..

[11]  Rowan McAllister,et al.  Learning Invariant Representations for Reinforcement Learning without Reconstruction , 2020, ICLR.

[12]  Sergey Levine,et al.  Skew-Fit: State-Covering Self-Supervised Reinforcement Learning , 2019, ICML.

[13]  Gaurav S. Sukhatme,et al.  Efficient Adaptation for End-to-End Vision-Based Robotic Manipulation , 2020, RSS 2020.

[14]  S. Savarese,et al.  Goal-Aware Prediction: Learning to Model What Matters , 2020, ICML.

[15]  Bernhard Schölkopf,et al.  A Kernel Method for the Two-Sample-Problem , 2006, NIPS.

[16]  Seong Jae Hwang,et al.  Domain Adversarial Neural Networks for Domain Generalization: When It Works and How to Improve , 2021, ArXiv.

[17]  D. Tao,et al.  Deep Domain Generalization via Conditional Invariant Adversarial Networks , 2018, ECCV.

[18]  Leslie Pack Kaelbling,et al.  Learning to Achieve Goals , 1993, IJCAI.

[19]  Sergey Levine,et al.  Visual Reinforcement Learning with Imagined Goals , 2018, NeurIPS.

[20]  Jure Leskovec,et al.  WILDS: A Benchmark of in-the-Wild Distribution Shifts , 2021, ICML.

[21]  Pierre-Yves Oudeyer,et al.  Unsupervised Learning of Goal Spaces for Intrinsically Motivated Goal Exploration , 2018, ICLR.

[22]  Ilya Kostrikov,et al.  Image Augmentation Is All You Need: Regularizing Deep Reinforcement Learning from Pixels , 2020, ArXiv.

[23]  Koby Crammer,et al.  A theory of learning from different domains , 2010, Machine Learning.

[24]  Michael I. Jordan,et al.  Transferable Adversarial Training: A General Approach to Adapting Deep Classifiers , 2019, ICML.

[25]  Jimmy Ba,et al.  Maximum Entropy Gain Exploration for Long Horizon Multi-goal Reinforcement Learning , 2020, ICML.

[26]  Marcin Andrychowicz,et al.  Hindsight Experience Replay , 2017, NIPS.

[27]  Pieter Abbeel,et al.  Automatic Goal Generation for Reinforcement Learning Agents , 2017, ICML.

[28]  Rico Jonschkowski,et al.  The Distracting Control Suite - A Challenging Benchmark for Reinforcement Learning from Pixels , 2021, ArXiv.

[29]  Peter Dayan,et al.  Q-learning , 1992, Machine Learning.

[30]  Dawn Xiaodong Song,et al.  Assessing Generalization in Deep Reinforcement Learning , 2018, ArXiv.

[31]  Geoffrey E. Hinton,et al.  A Simple Framework for Contrastive Learning of Visual Representations , 2020, ICML.

[32]  Nan Jiang,et al.  Provably efficient RL with Rich Observations via Latent State Decoding , 2019, ICML.

[33]  David Lopez-Paz,et al.  Invariant Risk Minimization , 2019, ArXiv.

[34]  Cuiling Lan,et al.  Feature Alignment and Restoration for Domain Generalization and Adaptation , 2020, ArXiv.

[35]  Joelle Pineau,et al.  Invariant Causal Prediction for Block MDPs , 2020, ICML.

[36]  Max Welling,et al.  The Variational Fair Autoencoder , 2015, ICLR.

[37]  Jing Xu,et al.  Data-Efficient Hierarchical Reinforcement Learning for Robotic Assembly Control Applications , 2021, IEEE Transactions on Industrial Electronics.

[38]  Sergey Levine,et al.  C-Learning: Learning to Achieve Goals via Recursive Classification , 2020, ICLR.

[39]  Karol Hausman,et al.  A Geometric Perspective on Self-Supervised Policy Adaptation , 2020, ArXiv.

[40]  Deyu Meng,et al.  FastMMD: Ensemble of Circular Discrepancy for Efficient Two-Sample Test , 2014, Neural Computation.

[41]  Tom Schaul,et al.  Universal Value Function Approximators , 2015, ICML.

[42]  Pieter Abbeel,et al.  CURL: Contrastive Unsupervised Representations for Reinforcement Learning , 2020, ICML.

[43]  Pieter Abbeel,et al.  Reinforcement Learning with Augmented Data , 2020, NeurIPS.

[44]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[45]  Sergey Levine,et al.  Learning to Reach Goals via Iterated Supervised Learning , 2019, ICLR.

[46]  Sergey Levine,et al.  Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.

[47]  Sergey Levine,et al.  Trust Region Policy Optimization , 2015, ICML.

[48]  Ioannis Mitliagkas,et al.  Generalizing to unseen domains via distribution matching , 2019 .

[49]  Yuan Zhou,et al.  Exploration via Hindsight Goal Generation , 2019, NeurIPS.

[50]  Chelsea Finn,et al.  Hierarchical Foresight: Self-Supervised Learning of Long-Horizon Tasks via Visual Subgoal Generation , 2019, ICLR.

[51]  Herke van Hoof,et al.  Addressing Function Approximation Error in Actor-Critic Methods , 2018, ICML.

[52]  Marlos C. Machado,et al.  Contrastive Behavioral Similarity Embeddings for Generalization in Reinforcement Learning , 2021, ICLR.