Counterfactual Data Augmentation using Locally Factored Dynamics

Many dynamic processes, including common scenarios in robotic control and reinforcement learning (RL), involve a set of interacting subprocesses. Though the subprocesses are not independent, their interactions are often sparse, and the dynamics at any given time step can often be decomposed into locally independent causal mechanisms. Such local causal structures can be leveraged to improve the sample efficiency of sequence prediction and off-policy reinforcement learning. We formalize this by introducing local causal models (LCMs), which are induced from a global causal model by conditioning on a subset of the state space. We propose an approach to inferring these structures given an object-oriented state representation, as well as a novel algorithm for model-free Counterfactual Data Augmentation (CoDA). CoDA uses local structures and an experience replay to generate counterfactual experiences that are causally valid in the global model. We find that CoDA significantly improves the performance of RL agents in locally factored tasks, including the batch-constrained and goal-conditioned settings.

[1]  Richard S. Sutton,et al.  Dyna, an integrated architecture for learning, planning, and reacting , 1990, SGAR.

[2]  Long-Ji Lin,et al.  Reinforcement learning for robots using neural networks , 1992 .

[3]  Leslie Pack Kaelbling,et al.  Learning to Achieve Goals , 1993, IJCAI.

[4]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[5]  Craig Boutilier,et al.  Exploiting Structure in Policy Construction , 1995, IJCAI.

[6]  Craig Boutilier,et al.  Context-Specific Independence in Bayesian Networks , 1996, UAI.

[7]  Christopher G. Atkeson,et al.  A comparison of direct and model-based reinforcement learning , 1997, Proceedings of International Conference on Robotics and Automation.

[8]  Michael Kearns,et al.  Efficient Reinforcement Learning in Factored MDPs , 1999, IJCAI.

[9]  C. Granger Investigating causal relations by econometric models and cross-spectral methods , 1969 .

[10]  Tom Burr,et al.  Causation, Prediction, and Search , 2003, Technometrics.

[11]  Shobha Venkataraman,et al.  Efficient Solution Algorithms for Factored MDPs , 2003, J. Artif. Intell. Res..

[12]  Peter Dayan,et al.  Q-learning , 1992, Machine Learning.

[13]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[14]  Andre Cohen,et al.  An object-oriented representation for efficient reinforcement learning , 2008, ICML '08.

[15]  J. Pearl Causal inference in statistics: An overview , 2009 .

[16]  A. Gelman Causality and Statistical Learning , 2010 .

[17]  A. Gelman Causality and Statistical Learning1 , 2010, American Journal of Sociology.

[18]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[19]  Elias Bareinboim,et al.  Controlling Selection Bias in Causal Inference , 2011, AISTATS.

[20]  Sergey Levine,et al.  Offline policy evaluation across representations with applications to educational games , 2014, AAMAS.

[21]  Jin Tian,et al.  Recovering from Selection Bias in Causal and Statistical Inference , 2014, AAAI.

[22]  Shie Mannor,et al.  Off-policy Model-based Learning under Unknown Factored Dynamics , 2015, ICML.

[23]  Tom Schaul,et al.  Universal Value Function Approximators , 2015, ICML.

[24]  Hugo Larochelle,et al.  MADE: Masked Autoencoder for Distribution Estimation , 2015, ICML.

[25]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[26]  Marc G. Bellemare,et al.  The Arcade Learning Environment: An Evaluation Platform for General Agents , 2012, J. Artif. Intell. Res..

[27]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[28]  Kevin Gimpel,et al.  Gaussian Error Linear Units (GELUs) , 2016 .

[29]  Katja Hofmann,et al.  The Malmo Platform for Artificial Intelligence Experimentation , 2016, IJCAI.

[30]  Tom Schaul,et al.  Prioritized Experience Replay , 2015, ICLR.

[31]  Adam Tauman Kalai,et al.  Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings , 2016, NIPS.

[32]  Marc G. Bellemare,et al.  Safe and Efficient Off-Policy Reinforcement Learning , 2016, NIPS.

[33]  Philip S. Thomas,et al.  Data-Efficient Off-Policy Policy Evaluation for Reinforcement Learning , 2016, ICML.

[34]  Daniel Nikovski,et al.  Value-Aware Loss Function for Model-based Reinforcement Learning , 2017, AISTATS.

[35]  Luis Perez,et al.  The Effectiveness of Data Augmentation in Image Classification using Deep Learning , 2017, ArXiv.

[36]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[37]  Marcin Andrychowicz,et al.  Hindsight Experience Replay , 2017, NIPS.

[38]  Yi Wu,et al.  Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments , 2017, NIPS.

[39]  Alexander J. Smola,et al.  Deep Sets , 2017, 1703.06114.

[40]  Satinder Singh,et al.  Value Prediction Network , 2017, NIPS.

[41]  Bernhard Schölkopf,et al.  Elements of Causal Inference: Foundations and Learning Algorithms , 2017 .

[42]  Sheila A. McIlraith,et al.  Using Reward Machines for High-Level Task Specification and Decomposition in Reinforcement Learning , 2018, ICML.

[43]  Pascale Fung,et al.  Reducing Gender Bias in Abusive Language Detection , 2018, EMNLP.

[44]  Alexandre M. Bayen,et al.  Variance Reduction for Policy Gradient with Action-Dependent Factorized Baselines , 2018, ICLR.

[45]  Shimon Whiteson,et al.  Counterfactual Multi-Agent Policy Gradients , 2017, AAAI.

[46]  Sergey Levine,et al.  Visual Reinforcement Learning with Imagined Goals , 2018, NeurIPS.

[47]  Tianfu Wu,et al.  ARCHER: Aggressive Rewards to Counter bias in Hindsight Experience Replay , 2018, ArXiv.

[48]  Herke van Hoof,et al.  Addressing Function Approximation Error in Actor-Critic Methods , 2018, ICML.

[49]  Razvan Pascanu,et al.  Relational Deep Reinforcement Learning , 2018, ArXiv.

[50]  John C. Duchi,et al.  Certifying Some Distributional Robustness with Principled Adversarial Training , 2017, ICLR.

[51]  Chongjie Zhang,et al.  Object-Oriented Dynamics Predictor , 2018, NeurIPS.

[52]  Bernhard Schölkopf,et al.  Deconfounding Reinforcement Learning in Observational Settings , 2018, ArXiv.

[53]  Murray Shanahan,et al.  SCAN: Learning Hierarchical Compositional Visual Concepts , 2017, ICLR.

[54]  Yongxin Yang,et al.  Learning to Generalize: Meta-Learning for Domain Generalization , 2017, AAAI.

[55]  Yao Liu,et al.  Behaviour Policy Estimation in Off-Policy Policy Evaluation: Calibration Matters , 2018, ArXiv.

[56]  Marcin Andrychowicz,et al.  Multi-Goal Reinforcement Learning: Challenging Robotics Environments and Request for Research , 2018, ArXiv.

[57]  Sebastian Nowozin,et al.  Can You Trust Your Model's Uncertainty? Evaluating Predictive Uncertainty Under Dataset Shift , 2019, NeurIPS.

[58]  B. Hayden,et al.  Monkeys are curious about counterfactual outcomes , 2018, Cognition.

[59]  Fangkai Yang,et al.  SDRL: Interpretable and Data-efficient Deep Reinforcement Learning Leveraging Symbolic Planning , 2018, AAAI.

[60]  Pieter Abbeel,et al.  Benchmarking Model-Based Reinforcement Learning , 2019, ArXiv.

[61]  Yee Whye Teh,et al.  Set Transformer , 2018, ICML.

[62]  Doina Precup,et al.  Off-Policy Deep Reinforcement Learning without Exploration , 2018, ICML.

[63]  Ankur Taly,et al.  Counterfactual Fairness in Text Classification through Robustness , 2018, AIES.

[64]  Nicolas Heess,et al.  Woulda, Coulda, Shoulda: Counterfactually-Guided Policy Search , 2018, ICLR.

[65]  Bernhard Schölkopf,et al.  Inferring causation from time series in Earth system sciences , 2019, Nature Communications.

[66]  David Sontag,et al.  Counterfactual Off-Policy Evaluation with Gumbel-Max Structural Causal Models , 2019, ICML.

[67]  Sergey Levine,et al.  When to Trust Your Model: Model-Based Policy Optimization , 2019, NeurIPS.

[68]  Bernhard Schölkopf,et al.  Challenging Common Assumptions in the Unsupervised Learning of Disentangled Representations , 2018, ICML.

[69]  Alex Mott,et al.  Towards Interpretable Reinforcement Learning Using Attention Augmented Agents , 2019, NeurIPS.

[70]  Dana H. Brooks,et al.  Structured Disentangled Representations , 2018, AISTATS.

[71]  Sergey Levine,et al.  Diagnosing Bottlenecks in Deep Q-learning Algorithms , 2019, ICML.

[72]  David Silver,et al.  Credit Assignment Techniques in Stochastic Computation Graphs , 2019, AISTATS.

[73]  Alexander Lerchner,et al.  COBRA: Data-Efficient Model-Based RL through Unsupervised Object Discovery and Curiosity-Driven Exploration , 2019, ArXiv.

[74]  Matthew Botvinick,et al.  MONet: Unsupervised Scene Decomposition and Representation , 2019, ArXiv.

[75]  Taehoon Kim,et al.  Quantifying Generalization in Reinforcement Learning , 2018, ICML.

[76]  Allan Jabri,et al.  Towards Practical Multi-Object Manipulation using Relational Reinforcement Learning , 2019, 2020 IEEE International Conference on Robotics and Automation (ICRA).

[77]  Ben Poole,et al.  Weakly-Supervised Disentanglement Without Compromises , 2020, ICML.

[78]  S. Levine,et al.  Offline Reinforcement Learning: Tutorial, Review, and Perspectives on Open Problems , 2020, ArXiv.

[79]  Jimmy Ba,et al.  Maximum Entropy Gain Exploration for Long Horizon Multi-goal Reinforcement Learning , 2020, ICML.

[80]  P. Abbeel,et al.  Reinforcement Learning with Augmented Data , 2020, NeurIPS.

[81]  Fabio Viola,et al.  Causally Correct Partial Models for Reinforcement Learning , 2020, ArXiv.

[82]  Tristan Deleu,et al.  Gradient-Based Neural DAG Learning , 2019, ICLR.

[83]  Sergey Levine,et al.  DisCor: Corrective Feedback in Reinforcement Learning via Distribution Correction , 2020, NeurIPS.

[84]  Yannick Schroecker,et al.  Universal Value Density Estimation for Imitation Learning and Goal-Conditioned Reinforcement Learning , 2020, ArXiv.

[85]  Tim Miller,et al.  Explainable Reinforcement Learning Through a Causal Lens , 2019, AAAI.

[86]  Jakub W. Pachocki,et al.  Learning dexterous in-hand manipulation , 2018, Int. J. Robotics Res..

[87]  Aysegul Dundar,et al.  Unsupervised Disentanglement of Pose, Appearance and Background from Images and Videos , 2020, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[88]  R. Fergus,et al.  Image Augmentation Is All You Need: Regularizing Deep Reinforcement Learning from Pixels , 2020, ICLR.

[89]  Sergey Levine,et al.  Recurrent Independent Mechanisms , 2019, ICLR.