GalilAI: Out-of-Task Distribution Detection using Causal Active Experimentation for Safe Transfer RL

Out-of-distribution (OOD) detection is a well-studied topic in supervised learning. Extending the successes in supervised learning methods to the reinforcement learning (RL) setting, however, is difficult due to the data generating process — RL agents actively query their environment for data, and the data are a function of the policy followed by the agent. An agent could thus neglect a shift in the environment if its policy did not lead it to explore the aspect of the environment that shifted. Therefore, to achieve safe and robust generalization in RL, there exists an unmet need for OOD detection through active experimentation. Here, we attempt to bridge this lacuna by first defining a causal framework for OOD scenarios or environments encountered by RL agents in the wild. Then, we propose a novel task: that of Out-of-Task Distribution (OOTD) detection. We introduce an RL agent that actively experiments in a test environment and subsequently concludes whether it is OOTD or not. We name our method GalilAI, in honor of Galileo Galilei, as it discovers, among other causal processes, that gravitational acceleration is independent of the mass of a body. Finally, we propose a simple probabilistic neural network baseline for comparison, which extends extant Model-Based RL. We find that GalilAI outperforms the baseline signifiThese authors contributed equally to this work Preliminary work. Under review. Do not distribute. cantly. See visualizations of our method here.

[1]  Graham W. Taylor,et al.  Learning Confidence for Out-of-Distribution Detection in Neural Networks , 2018, ArXiv.

[2]  Bernhard Schölkopf,et al.  Domain Generalization via Invariant Feature Representation , 2013, ICML.

[3]  J. Rissanen,et al.  Modeling By Shortest Data Description* , 1978, Autom..

[4]  Jürgen Schmidhuber,et al.  Gödel Machines: Fully Self-referential Optimal Universal Self-improvers , 2007, Artificial General Intelligence.

[5]  Shimon Whiteson,et al.  VariBAD: A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning , 2020, ICLR.

[6]  J. I The Design of Experiments , 1936, Nature.

[7]  Han Zhao,et al.  On Learning Invariant Representations for Domain Adaptation , 2019, ICML.

[8]  Peter Grünwald,et al.  A tutorial introduction to the minimum description length principle , 2004, ArXiv.

[9]  Yoshua Bengio,et al.  Deep Learning of Representations: Looking Forward , 2013, SLSP.

[10]  John Schulman,et al.  Concrete Problems in AI Safety , 2016, ArXiv.

[11]  Bernhard Schölkopf,et al.  Causal Inference Using the Algorithmic Markov Condition , 2008, IEEE Transactions on Information Theory.

[12]  Jonathon Shlens,et al.  Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[13]  Bernhard Scholkopf Causality for Machine Learning , 2019 .

[14]  Yuval Tassa,et al.  MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[15]  Bartunov Sergey,et al.  Meta-Learning with Memory-Augmented Neural Networks , 2016 .

[16]  Kevin Gimpel,et al.  A Baseline for Detecting Misclassified and Out-of-Distribution Examples in Neural Networks , 2016, ICLR.

[17]  Mike E. Davies,et al.  Sparse and shift-Invariant representations of music , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[18]  Yali Amit,et al.  Likelihood Regret: An Out-of-Distribution Detection Score For Variational Auto-encoder , 2020, NeurIPS.

[19]  Hongxia Jin,et al.  Generalized ODIN: Detecting Out-of-Distribution Image Without Learning From Out-of-Distribution Data , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Sergey Levine,et al.  Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.

[21]  Bernhard Schölkopf,et al.  Causal Curiosity: RL Agents Discovering Self-supervised Experiments for Causal Representation Learning , 2021, ICML.

[22]  Sergey Levine,et al.  Deep Reinforcement Learning in a Handful of Trials using Probabilistic Dynamics Models , 2018, NeurIPS.

[23]  Bernhard Schölkopf,et al.  On causal and anticausal learning , 2012, ICML.

[24]  Sergey Levine,et al.  Meta-Reinforcement Learning of Structured Exploration Strategies , 2018, NeurIPS.

[25]  Felipe Petroski Such,et al.  Generalized Hidden Parameter MDPs Transferable Model-based RL in a Handful of Trials , 2020, AAAI.

[26]  Yoshua Bengio,et al.  CausalWorld: A Robotic Manipulation Benchmark for Causal Structure and Transfer Learning , 2020, ICLR.

[27]  Sergey Levine,et al.  Efficient Off-Policy Meta-Reinforcement Learning via Probabilistic Context Variables , 2019, ICML.

[28]  Ruslan Salakhutdinov,et al.  Actor-Mimic: Deep Multitask and Transfer Reinforcement Learning , 2015, ICLR.

[29]  Charles Blundell,et al.  Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles , 2016, NIPS.

[30]  Bhuvana Ramabhadran,et al.  Invariant Representations for Noisy Speech Recognition , 2016, ArXiv.

[31]  R. Srikant,et al.  Enhancing The Reliability of Out-of-distribution Image Detection in Neural Networks , 2017, ICLR.

[32]  Bernhard Schölkopf,et al.  Learning Independent Causal Mechanisms , 2017, ICML.

[33]  Sergey Levine,et al.  Learning to Adapt in Dynamic, Real-World Environments through Meta-Reinforcement Learning , 2018, ICLR.