Modeling Expectation Violation in Intuitive Physics with Coarse Probabilistic Object Representations

From infancy, humans have expectations about how objects will move and interact. Even young children expect objects not to move through one another, teleport, or disappear. They are surprised by mismatches between physical expectations and perceptual observations, even in unfamiliar scenes with completely novel objects. A model that exhibits human-like understanding of physics should be similarly surprised, and adjust its beliefs accordingly. We propose ADEPT, a model that uses a coarse (approximate geometry) object-centric representation for dynamic 3D scene understanding. Inference integrates deep recognition networks, extended probabilistic physical simulation, and particle filtering for forming predictions and expectations across occlusion. We also present a new test set for measuring violations of physical expectations, using a range of scenarios derived from developmental psychology. We systematically compare ADEPT, baseline models, and human expectations on this test set. ADEPT outperforms standard network architectures in discriminating physically implausible scenes, and often performs this discrimination at the same level as people.

[1]  Katherine D. Kinzler,et al.  Core knowledge. , 2007, Developmental science.

[2]  Jessica B. Hamrick,et al.  Inferring mass in complex scenes by mental simulation , 2016, Cognition.

[3]  Richard N. Aslin,et al.  The Goldilocks Effect: Human Infants Allocate Attention to Visual Sequences That Are Neither Too Simple Nor Too Complex , 2012, PloS one.

[4]  Leonidas J. Guibas,et al.  ShapeNet: An Information-Rich 3D Model Repository , 2015, ArXiv.

[5]  Jiajun Wu,et al.  Neural Scene De-rendering , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Katsushi Ikeuchi,et al.  Scene Understanding by Reasoning Stability and Safety , 2015, International Journal of Computer Vision.

[7]  Joshua B. Tenenbaum,et al.  Building machines that learn and think like people , 2016, Behavioral and Brain Sciences.

[8]  H. Furth Object permanence in five-month-old infants. , 1987, Cognition.

[9]  David Amos,et al.  Probing Physics Knowledge Using Tools from Developmental Psychology , 2018, ArXiv.

[10]  A. Yuille,et al.  Opinion TRENDS in Cognitive Sciences Vol.10 No.7 July 2006 Special Issue: Probabilistic models of cognition Vision as Bayesian inference: analysis by synthesis? , 2022 .

[11]  Joshua B. Tenenbaum,et al.  A Compositional Object-Based Approach to Learning Physical Dynamics , 2016, ICLR.

[12]  Razvan Pascanu,et al.  Interaction Networks for Learning about Objects, Relations and Physics , 2016, NIPS.

[13]  S. Carey,et al.  Infants’ Metaphysics: The Case of Numerical Identity , 1996, Cognitive Psychology.

[14]  François Osiurak,et al.  Tool use and affordance: Manipulation-based versus reasoning-based approaches. , 2016, Psychological review.

[15]  E. Spelke,et al.  Origins of knowledge. , 1992, Psychological review.

[16]  Stefan Schaal,et al.  Combining learned and analytical models for predicting action effects from sensory data , 2017, Int. J. Robotics Res..

[17]  Noah D. Goodman,et al.  Learning physical parameters from dynamic scenes , 2018, Cognitive Psychology.

[18]  Karen Wynn,et al.  Addition and subtraction by human infants , 1992, Nature.

[19]  Song-Chun Zhu,et al.  Cooperative Holistic Scene Understanding: Unifying 3D Object, Layout, and Camera Pose Estimation , 2018, NeurIPS.

[20]  Jitendra Malik,et al.  Learning to Poke by Poking: Experiential Learning of Intuitive Physics , 2016, NIPS.

[21]  Joshua B. Tenenbaum,et al.  Deep Convolutional Inverse Graphics Network , 2015, NIPS.

[22]  George A. Alvarez,et al.  Explaining human multiple object tracking as resource-constrained approximate inference in a dynamic probabilistic model , 2009, NIPS.

[23]  Ali Farhadi,et al.  Newtonian Image Understanding: Unfolding the Dynamics of Objects in Static Images , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Jessica B. Hamrick,et al.  psiTurk: An open-source framework for conducting replicable behavioral experiments online , 2016, Behavior research methods.

[25]  Kevin A. Smith,et al.  Sources of uncertainty in intuitive physics , 2012, CogSci.

[26]  Koray Kavukcuoglu,et al.  Neural scene representation and rendering , 2018, Science.

[27]  Emmanuel Dupoux,et al.  IntPhys: A Framework and Benchmark for Visual Intuitive Physics Reasoning , 2018, ArXiv.

[28]  Jiajun Wu,et al.  Learning to See Physics via Visual De-animation , 2017, NIPS.

[29]  E. Spelke,et al.  Spatiotemporal continuity, smoothness of motion and object identity in infancy , 1995 .

[30]  J. Tenenbaum,et al.  Mind Games: Game Engines as an Architecture for Intuitive Physics , 2017, Trends in Cognitive Sciences.

[31]  Jitendra Malik,et al.  Learning Visual Predictive Models of Physics for Playing Billiards , 2015, ICLR.

[32]  Daniel L. K. Yamins,et al.  Flexible Neural Representation for Physics Prediction , 2018, NeurIPS.

[33]  Aimee E. Stahl,et al.  Observing the unexpected enhances infants’ learning and exploration , 2015, Science.

[34]  Rob Fergus,et al.  Learning Physical Intuition of Block Towers by Example , 2016, ICML.

[35]  Geoffrey E. Hinton,et al.  Attend, Infer, Repeat: Fast Scene Understanding with Generative Models , 2016, NIPS.

[36]  Sergey Levine,et al.  Unsupervised Learning for Physical Interaction through Video Prediction , 2016, NIPS.

[37]  R. Baillargeon Object permanence in 3½- and 4½-month-old infants. , 1987 .

[38]  Antonio Torralba,et al.  Generating Videos with Scene Dynamics , 2016, NIPS.

[39]  Jessica B. Hamrick,et al.  Simulation as an engine of physical scene understanding , 2013, Proceedings of the National Academy of Sciences.

[40]  J. Stockman Pure Reasoning in 12-Month-Old Infants as Probabilistic Inference , 2013 .