COPHY: Counterfactual Learning of Physical Dynamics

Understanding causes and effects in mechanical systems is an essential component of reasoning in the physical world. This work poses a new problem of counterfactual learning of object mechanics from visual input. We develop the CoPhy benchmark to assess the capacity of the state-of-the-art models for causal physical reasoning in a synthetic 3D environment and propose a model for learning the physical dynamics in a counterfactual setting. Having observed a mechanical experiment that involves, for example, a falling tower of blocks, a set of bouncing balls or colliding objects, we learn to predict how its outcome is affected by an arbitrary intervention on its initial conditions, such as displacing one of the objects in the scene. The alternative future is predicted given the altered past and a latent representation of the confounders learned by the model in an end-to-end fashion with no supervision. We compare against feedforward video prediction baselines and show how observing alternative experiences allows the network to capture latent physical properties of the environment, which results in significantly more accurate predictions at the level of super human performance.

[1]  Razvan Pascanu,et al.  A simple neural network module for relational reasoning , 2017, NIPS.

[2]  Ross B. Girshick,et al.  Mask R-CNN , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Bernhard Schölkopf,et al.  Distinguishing Cause from Effect Using Observational Data: Methods and Benchmarks , 2014, J. Mach. Learn. Res..

[4]  James R. Kubricht,et al.  Intuitive Physics: Current Research and Controversies , 2017, Trends in Cognitive Sciences.

[5]  Yann LeCun,et al.  Deep multi-scale video prediction beyond mean square error , 2015, ICLR.

[6]  R. C. Oldfield THE PERCEPTION OF CAUSALITY , 1963 .

[7]  Christopher D. Manning,et al.  Compositional Attention Networks for Machine Reasoning , 2018, ICLR.

[8]  Jessica B. Hamrick,et al.  Simulation as an engine of physical scene understanding , 2013, Proceedings of the National Academy of Sciences.

[9]  M. McCloskey,et al.  Naive physics: the curvilinear impetus principle and its role in interactions with moving objects. , 1983, Journal of experimental psychology. Learning, memory, and cognition.

[10]  Li Fei-Fei,et al.  Inferring and Executing Programs for Visual Reasoning , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[11]  Alexandros G. Dimakis,et al.  CausalGAN: Learning Causal Implicit Generative Models with Adversarial Training , 2017, ICLR.

[12]  Pietro Liò,et al.  Graph Attention Networks , 2017, ICLR.

[13]  Max Welling,et al.  Semi-Supervised Classification with Graph Convolutional Networks , 2016, ICLR.

[14]  Christian Wolf,et al.  Object Level Visual Reasoning in Videos , 2018, ECCV.

[15]  Judea Pearl,et al.  Counterfactual Probabilities: Computational Methods, Bounds and Applications , 1994, UAI.

[16]  Jürgen Schmidhuber,et al.  Relational Neural Expectation Maximization: Unsupervised Discovery of Objects and their Interactions , 2018, ICLR.

[17]  J. Pearl Causality: Models, Reasoning and Inference , 2000 .

[18]  Michael McCloskey,et al.  Naive physics: the curvilinear impetus principle and its role in interactions with moving objects , 1983 .

[19]  Emmanuel Dupoux,et al.  IntPhys: A Framework and Benchmark for Visual Intuitive Physics Reasoning , 2018, ArXiv.

[20]  Joshua B. Tenenbaum,et al.  Building machines that learn and think like people , 2016, Behavioral and Brain Sciences.

[21]  Jiajun Wu,et al.  Galileo: Perceiving Physical Object Properties by Integrating a Physics Engine with Deep Learning , 2015, NIPS.

[22]  Sergio Gomez Colmenarejo,et al.  Hybrid computing using a neural network with dynamic external memory , 2016, Nature.

[23]  Aaron C. Courville,et al.  FiLM: Visual Reasoning with a General Conditioning Layer , 2017, AAAI.

[24]  J. Tenenbaum,et al.  CLEVRER: CoLlision Events for Video REpresentation and Reasoning , 2019, ICLR.

[25]  J. Call,et al.  Tubes, tables and traps: great apes solve two functionally equivalent trap tasks but show no evidence of transfer across tasks , 2008, Animal Cognition.

[26]  Ross B. Girshick,et al.  PHYRE: A New Benchmark for Physical Reasoning , 2019, NeurIPS.

[27]  A. Caramazza,et al.  Curvilinear motion in the absence of external forces: naive beliefs about the motion of objects. , 1980, Science.

[28]  Abhinav Gupta,et al.  Interpretable Intuitive Physics Model , 2018, ECCV.

[29]  Joshua B. Tenenbaum,et al.  How, whether, why: Causal judgments as counterfactual contrasts , 2015, CogSci.

[30]  Yuting Zhang,et al.  Deep Visual Analogy-Making , 2015, NIPS.

[31]  Joshua B. Tenenbaum,et al.  A Compositional Object-Based Approach to Learning Physical Dynamics , 2016, ICLR.

[32]  Trevor Darrell,et al.  Learning to Reason: End-to-End Module Networks for Visual Question Answering , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[33]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[34]  Felix Hill,et al.  Measuring abstract reasoning in neural networks , 2018, ICML.

[35]  Razvan Pascanu,et al.  Interaction Networks for Learning about Objects, Relations and Physics , 2016, NIPS.

[36]  Rob Fergus,et al.  Learning Physical Intuition of Block Towers by Example , 2016, ICML.

[37]  Herbert Jaeger Artificial intelligence: Deep neural reasoning , 2016, Nature.

[38]  Niloy J. Mitra,et al.  Unsupervised Intuitive Physics from Past Experiences , 2019, ArXiv.

[39]  David Lopez-Paz,et al.  Causal Discovery Using Proxy Variables , 2017, ICLR.

[40]  Jiajun Wu,et al.  Unsupervised Learning of Latent Physical Properties Using Perception-Prediction Networks , 2018, UAI.

[41]  S. Levine,et al.  Reasoning About Physical Interactions with Object-Centric Models , 2018 .

[42]  Stefan Lee,et al.  Embodied Question Answering in Photorealistic Environments With Point Cloud Perception , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[43]  M. Oquab,et al.  Revisiting Classifier Two-Sample Tests for GAN Evaluation and Causal Discovery , 2016 .

[44]  Peter Spirtes,et al.  Introduction to Causal Inference , 2010, J. Mach. Learn. Res..

[45]  Jiajun Wu,et al.  Learning to See Physics via Visual De-animation , 2017, NIPS.

[46]  Chuang Gan,et al.  The Neuro-Symbolic Concept Learner: Interpreting Scenes Words and Sentences from Natural Supervision , 2019, ICLR.

[47]  Andrea Vedaldi,et al.  ShapeStacks: Learning Vision-Based Physical Intuition for Generalised Object Stacking , 2018, ECCV.

[48]  Bernhard Schölkopf,et al.  On causal and anticausal learning , 2012, ICML.

[49]  Mélanie Frappier,et al.  The Book of Why: The New Science of Cause and Effect , 2018, Science.

[50]  Judea Pearl,et al.  Theoretical Impediments to Machine Learning With Seven Sparks from the Causal Revolution , 2018, WSDM.

[51]  D. Ramanan,et al.  CATER: A diagnostic dataset for Compositional Actions and TEmporal Reasoning , 2019, ICLR.

[52]  M. McCloskey,et al.  Intuitive physics: the straight-down belief and its origin. , 1983, Journal of experimental psychology. Learning, memory, and cognition.

[53]  Bernhard Schölkopf,et al.  Discovering Causal Signals in Images , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).