Causal World Models by Unsupervised Deconfounding of Physical Dynamics

The capability of imagining internally with a mental model of the world is vitally important for human cognition. If a machine intelligent agent can learn a world model to create a "dream" environment, it can then internally ask what-if questions – simulate the alternative futures that haven’t been experienced in the past yet – and make optimal decisions accordingly. Existing world models are established typically by learning spatio-temporal regularities embedded from the past sensory signal without taking into account confounding factors that influence state transition dynamics. As such, they fail to answer the critical counterfactual questions about "what would have happened" if a certain action policy was taken. In this paper, we propose Causal World Models (CWMs) that allow unsupervised modeling of relationships between the intervened observations and the alternative futures by learning an estimator of the latent confounding factors. We empirically evaluate our method and demonstrate its effectiveness in a variety of physical reasoning environments. Specifically, we show reductions in sample complexity for reinforcement learning tasks and improvements in counterfactual physical reasoning.

[1]  Guillaume Bouchard,et al.  Complex Embeddings for Simple Link Prediction , 2016, ICML.

[2]  Pietro Perona,et al.  Visual Causal Feature Learning , 2014, UAI.

[3]  Max Welling,et al.  Causal Effect Inference with Deep Latent-Variable Models , 2017, NIPS 2017.

[4]  Christopher D. Manning,et al.  Compositional Attention Networks for Machine Reasoning , 2018, ICLR.

[5]  Richard S. Zemel,et al.  Gated Graph Sequence Neural Networks , 2015, ICLR.

[6]  Ying Wen,et al.  A Regularized Opponent Model with Maximum Entropy Objective , 2019, IJCAI.

[7]  Ross B. Girshick,et al.  PHYRE: A New Benchmark for Physical Reasoning , 2019, NeurIPS.

[8]  Bernhard Scholkopf Causality for Machine Learning , 2019 .

[9]  James R. Kubricht,et al.  Intuitive Physics: Current Research and Controversies , 2017, Trends in Cognitive Sciences.

[10]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[11]  Sergey Levine,et al.  Continuous Deep Q-Learning with Model-based Acceleration , 2016, ICML.

[12]  Ah Chung Tsoi,et al.  The Graph Neural Network Model , 2009, IEEE Transactions on Neural Networks.

[13]  Jürgen Schmidhuber,et al.  Neural Expectation Maximization , 2017, NIPS.

[14]  Jure Leskovec,et al.  node2vec: Scalable Feature Learning for Networks , 2016, KDD.

[15]  Thorsten Joachims,et al.  Batch learning from logged bandit feedback through counterfactual risk minimization , 2015, J. Mach. Learn. Res..

[16]  Elise van der Pol,et al.  Contrastive Learning of Structured World Models , 2020, ICLR.

[17]  Hans-Peter Kriegel,et al.  A Three-Way Model for Collective Learning on Multi-Relational Data , 2011, ICML.

[18]  Chen Sun,et al.  Unsupervised Discovery of Parts, Structure, and Dynamics , 2019, ICLR.

[19]  Alexandros G. Dimakis,et al.  CausalGAN: Learning Causal Implicit Generative Models with Adversarial Training , 2017, ICLR.

[20]  Sergey Levine,et al.  Learning Neural Network Policies with Guided Policy Search under Unknown Dynamics , 2014, NIPS.

[21]  Klaus Greff,et al.  Multi-Object Representation Learning with Iterative Variational Inference , 2019, ICML.

[22]  Joshua B. Tenenbaum,et al.  Building machines that learn and think like people , 2016, Behavioral and Brain Sciences.

[23]  Razvan Pascanu,et al.  A simple neural network module for relational reasoning , 2017, NIPS.

[24]  Chuang Gan,et al.  The Neuro-Symbolic Concept Learner: Interpreting Scenes Words and Sentences from Natural Supervision , 2019, ICLR.

[25]  Yee Whye Teh,et al.  Sequential Attend, Infer, Repeat: Generative Modelling of Moving Objects , 2018, NeurIPS.

[26]  Murray Shanahan,et al.  Towards Deep Symbolic Reinforcement Learning , 2016, ArXiv.

[27]  Fu Jie Huang,et al.  A Tutorial on Energy-Based Learning , 2006 .

[28]  J. Robins,et al.  Estimation of Regression Coefficients When Some Regressors are not Always Observed , 1994 .

[29]  Rob Fergus,et al.  Learning Physical Intuition of Block Towers by Example , 2016, ICML.

[30]  Sergey Levine,et al.  Model-Based Reinforcement Learning for Atari , 2019, ICLR.

[31]  Shimon Whiteson,et al.  Deep Variational Reinforcement Learning for POMDPs , 2018, ICML.

[32]  Razvan Pascanu,et al.  Relational inductive biases, deep learning, and graph networks , 2018, ArXiv.

[33]  Jürgen Schmidhuber,et al.  Recurrent World Models Facilitate Policy Evolution , 2018, NeurIPS.

[34]  Judea Pearl,et al.  Counterfactuals and Policy Analysis in Structural Models , 1995, UAI.

[35]  Cordelia Schmid,et al.  Actor-Centric Relation Network , 2018, ECCV.

[36]  Steven Skiena,et al.  DeepWalk: online learning of social representations , 2014, KDD.

[37]  C. Cassel,et al.  Some results on generalized difference estimation and generalized regression estimation for finite populations , 1976 .

[38]  Raia Hadsell,et al.  Graph networks as learnable physics engines for inference and control , 2018, ICML.

[39]  Jiajun Wu,et al.  Unsupervised Learning of Latent Physical Properties Using Perception-Prediction Networks , 2018, UAI.

[40]  Christian Wolf,et al.  COPHY: Counterfactual Learning of Physical Dynamics , 2020, ICLR.

[41]  Jiajun Wu,et al.  Galileo: Perceiving Physical Object Properties by Integrating a Physics Engine with Deep Learning , 2015, NIPS.

[42]  J. Pearl Causality: Models, Reasoning and Inference , 2000 .

[43]  Martin A. Riedmiller,et al.  Embed to Control: A Locally Linear Latent Dynamics Model for Control from Raw Images , 2015, NIPS.

[44]  D. Rubin,et al.  The central role of the propensity score in observational studies for causal effects , 1983 .

[45]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[46]  Thomas B. Schön,et al.  From Pixels to Torques: Policy Learning with Deep Dynamical Models , 2015, ICML 2015.

[47]  Emma Brunskill,et al.  Strategic Object Oriented Reinforcement Learning , 2018, ArXiv.

[48]  Max Welling,et al.  Semi-Supervised Classification with Graph Convolutional Networks , 2016, ICLR.

[49]  S. Levine,et al.  Reasoning About Physical Interactions with Object-Centric Models , 2018 .

[50]  Joshua B. Tenenbaum,et al.  Human-level concept learning through probabilistic program induction , 2015, Science.

[51]  E. H. Simpson,et al.  The Interpretation of Interaction in Contingency Tables , 1951 .

[52]  Chongjie Zhang,et al.  Object-Oriented Dynamics Predictor , 2018, NeurIPS.

[53]  Jordan L. Boyd-Graber,et al.  Opponent Modeling in Deep Reinforcement Learning , 2016, ICML.

[54]  Li Fei-Fei,et al.  Inferring and Executing Programs for Visual Reasoning , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[55]  Mihaela van der Schaar,et al.  Validating Causal Inference Models via Influence Functions , 2019, ICML.

[56]  Alexander Lerchner,et al.  COBRA: Data-Efficient Model-Based RL through Unsupervised Object Discovery and Curiosity-Driven Exploration , 2019, ArXiv.

[57]  Thorsten Joachims,et al.  Counterfactual Risk Minimization: Learning from Logged Bandit Feedback , 2015, ICML.

[58]  Nathan Kallus,et al.  Policy Evaluation and Optimization with Continuous Treatments , 2018, AISTATS.

[59]  Jürgen Schmidhuber,et al.  Relational Neural Expectation Maximization: Unsupervised Discovery of Objects and their Interactions , 2018, ICLR.

[60]  Haitham Bou-Ammar,et al.  Multi-View Reinforcement Learning , 2019, NeurIPS.

[61]  Stefan Wager,et al.  Efficient Policy Learning , 2017, ArXiv.

[62]  Cordelia Schmid,et al.  Relational Action Forecasting , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[63]  Sanja Fidler,et al.  NerveNet: Learning Structured Policy with Graph Neural Networks , 2018, ICLR.

[64]  Razvan Pascanu,et al.  Deep reinforcement learning with relational inductive biases , 2018, ICLR.

[65]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[66]  John Langford,et al.  Doubly Robust Policy Evaluation and Learning , 2011, ICML.

[67]  Pietro Liò,et al.  Graph Attention Networks , 2017, ICLR.

[68]  Thorsten Joachims,et al.  The Self-Normalized Estimator for Counterfactual Learning , 2015, NIPS.

[69]  Tianqi Chen,et al.  Empirical Evaluation of Rectified Activations in Convolutional Network , 2015, ArXiv.

[70]  Ingmar Posner,et al.  GENESIS: Generative Scene Inference and Sampling with Object-Centric Latent Representations , 2019, ICLR.

[71]  Judea Pearl,et al.  Counterfactual Probabilities: Computational Methods, Bounds and Applications , 1994, UAI.

[72]  Bernhard Schölkopf,et al.  Distinguishing Cause from Effect Using Observational Data: Methods and Benchmarks , 2014, J. Mach. Learn. Res..

[73]  Joshua B. Tenenbaum,et al.  A Compositional Object-Based Approach to Learning Physical Dynamics , 2016, ICLR.

[74]  R. Zemel,et al.  Neural Relational Inference for Interacting Systems , 2018, ICML.

[75]  Ankush Gupta,et al.  Unsupervised Learning of Object Keypoints for Perception and Control , 2019, NeurIPS.

[76]  Max Welling,et al.  Modeling Relational Data with Graph Convolutional Networks , 2017, ESWC.

[77]  M. Oquab,et al.  Revisiting Classifier Two-Sample Tests for GAN Evaluation and Causal Discovery , 2016 .

[78]  Jiajun Wu,et al.  Learning to See Physics via Visual De-animation , 2017, NIPS.

[79]  Razvan Pascanu,et al.  Visual Interaction Networks: Learning a Physics Simulator from Video , 2017, NIPS.

[80]  Bernhard Schölkopf,et al.  Deconfounding Reinforcement Learning in Observational Settings , 2018, ArXiv.

[81]  Bernhard Schölkopf,et al.  Discovering Causal Signals in Images , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[82]  Sergey Levine,et al.  End-to-End Training of Deep Visuomotor Policies , 2015, J. Mach. Learn. Res..

[83]  Jason Weston,et al.  Translating Embeddings for Modeling Multi-relational Data , 2013, NIPS.

[84]  Shota Yasui,et al.  Counterfactual Cross-Validation: Effective Causal Model Selection from Observational Data , 2019, ArXiv.