To Fall Or Not To Fall: A Visual Approach to Physical Stability Prediction

Understanding physical phenomena is a key competence that enables humans and animals to act and interact under uncertain perception in previously unseen environments containing novel object and their configurations. Developmental psychology has shown that such skills are acquired by infants from observations at a very early stage. In this paper, we contrast a more traditional approach of taking a model-based route with explicit 3D representations and physical simulation by an end-to-end approach that directly predicts stability and related quantities from appearance. We ask the question if and to what extent and quality such a skill can directly be acquired in a data-driven way bypassing the need for an explicit simulation. We present a learning-based approach based on simulated data that predicts stability of towers comprised of wooden blocks under different conditions and quantities related to the potential fall of the towers. The evaluation is carried out on synthetic data and compared to human judgments on the same stimuli.

[1]  Jessica B. Hamrick,et al.  Simulation as an engine of physical scene understanding , 2013, Proceedings of the National Academy of Sciences.

[2]  Barry Smith,et al.  Naive Physics: An Essay in Ontology , 2015 .

[3]  Jitendra Malik,et al.  Learning Visual Predictive Models of Physics for Playing Billiards , 2015, ICLR.

[4]  Mario Fritz,et al.  Recognizing Materials from Virtual Examples , 2012, ECCV.

[5]  Kate Saenko,et al.  Learning Deep Object Detectors from 3D Models , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[6]  Harris Drucker,et al.  Comparison of learning algorithms for handwritten digit recognition , 1995 .

[7]  Derek Hoiem,et al.  Indoor Segmentation and Support Inference from RGBD Images , 2012, ECCV.

[8]  Douglas W. MacDougal Galileo’s Great Discovery: How Things Fall , 2012 .

[9]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[10]  R. Baillargeon A model of physical reasoning in infancy , 1995 .

[11]  Mario Fritz,et al.  Image-Based Synthesis and Re-synthesis of Viewpoints Guided by 3D Models , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[12]  Mark R. Mine,et al.  The Panda3D Graphics Engine , 2004, Computer.

[13]  Katsushi Ikeuchi,et al.  Beyond Point Clouds: Scene Understanding by Reasoning Geometry and Physics , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[14]  Manish Singh,et al.  Visual perception of the physical stability of asymmetric three-dimensional objects. , 2013, Journal of vision.

[15]  Manish Singh,et al.  Perception of physical stability and center of mass of 3-D objects. , 2015, Journal of vision.

[16]  Jessica B. Hamrick Internal physics models guide probabilistic judgments about object dynamics , 2011 .

[17]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[18]  R. Baillargeon The Acquisition of Physical Knowledge in Infancy: A Summary in Eight Lessons , 2007 .

[19]  R. Baillargeon How Do Infants Learn About the Physical World? , 1994 .

[20]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[21]  Song-Chun Zhu,et al.  Inferring "Dark Matter" and "Dark Energy" from Videos , 2013, 2013 IEEE International Conference on Computer Vision.

[22]  Ali Farhadi,et al.  Newtonian Image Understanding: Unfolding the Dynamics of Objects in Static Images , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Alexei A. Efros,et al.  Blocks World Revisited: Image Understanding Using Qualitative Geometry and Mechanics , 2010, ECCV.

[24]  Jiajun Wu,et al.  Galileo: Perceiving Physical Object Properties by Integrating a Physics Engine with Deep Learning , 2015, NIPS.

[25]  Mario Fritz,et al.  Deep Reflectance Maps , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Fei-Fei Li,et al.  ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[27]  Rob Fergus,et al.  Learning Physical Intuition of Block Towers by Example , 2016, ICML.

[28]  Stefan Carlsson,et al.  CNN Features Off-the-Shelf: An Astounding Baseline for Recognition , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[29]  R. Baillargeon Innate Ideas Revisited: For a Principle of Persistence in Infants' Physical Reasoning , 2008, Perspectives on psychological science : a journal of the Association for Psychological Science.