Physics 101: Learning Physical Object Properties from Unlabeled Videos

We study the problem of learning physical properties of objects from unlabeled videos. Humans can learn basic physical laws when they are very young, which suggests that such tasks may be important goals for computational vision systems. We consider various scenarios: objects sliding down an inclined surface and colliding; objects attached to a spring; objects falling onto various surfaces, etc. Many physical properties like mass, density, and coefficient of restitution influence the outcome of these scenarios, and our goal is to recover them automatically. We have collected 17,408 video clips containing 101 objects of various materials and appearances (shapes, colors, and sizes). Together, they form a dataset, named Physics 101, for studying object-centered physical properties. We propose an unsupervised representation learning model, which explicitly encodes basic physical laws into the structure and use them, with automatically discovered observations from videos, as supervision. Experiments demonstrate that our model can learn physical properties of objects from video. We also illustrate how its generative nature enables solving other tasks such as outcome prediction.

[1]  S. Drake,et al.  Galileo at Work: His Scientific Biography , 1969 .

[2]  Allan D. Jepson,et al.  The Computational Perception of Scene Dynamics , 1997, Comput. Vis. Image Underst..

[3]  Matthew Brand,et al.  Physics-Based Visual Understanding , 1997, Comput. Vis. Image Underst..

[4]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[5]  Edward H. Adelson,et al.  On seeing stuff: the perception of materials by humans and machines , 2001, IS&T/SPIE Electronic Imaging.

[6]  R. Baillargeon Infants' Physical World , 2004 .

[7]  S. Carey The Origin of Concepts , 2000 .

[8]  Andrew Zisserman,et al.  A Statistical Approach to Material Classification Using Image Patch Exemplars , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Clément Farabet,et al.  Torch7: A Matlab-like Environment for Machine Learning , 2011, NIPS 2011.

[10]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[11]  Jessica B. Hamrick,et al.  Simulation as an engine of physical scene understanding , 2013, Proceedings of the National Academy of Sciences.

[12]  William T. Freeman,et al.  Estimating the Material Properties of Fabric from Video , 2013, 2013 IEEE International Conference on Computer Vision.

[13]  Katsushi Ikeuchi,et al.  Detecting potential falling objects by inferring human action and natural disturbance , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[14]  Martial Hebert,et al.  Patch to the Future: Unsupervised Visual Prediction , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[15]  Noah Snavely,et al.  Material recognition in the wild with the Materials in Context Database , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Song-Chun Zhu,et al.  Understanding tools: Task-oriented object modeling, learning and recognition , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Jiajun Wu,et al.  Galileo: Perceiving Physical Object Properties by Integrating a Physics Engine with Deep Learning , 2015, NIPS.

[18]  Sai Kit Yeung,et al.  Fill and Transfer: A Simple Physics-Based Approach for Containability Reasoning , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[19]  Tsuhan Chen,et al.  3D Reasoning from Blocks to Stability , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20]  Jitendra Malik,et al.  Learning Visual Predictive Models of Physics for Playing Billiards , 2015, ICLR.

[21]  Ali Farhadi,et al.  Newtonian Image Understanding: Unfolding the Dynamics of Objects in Static Images , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Jiajun Wu,et al.  A Comparative Evaluation of Approximate Probabilistic Simulation and Deep Neural Networks as Accounts of Human Physical Scene Understanding , 2016, CogSci.

[23]  Sergey Levine,et al.  End-to-End Training of Deep Visuomotor Policies , 2015, J. Mach. Learn. Res..

[24]  Rob Fergus,et al.  Learning Physical Intuition of Block Towers by Example , 2016, ICML.

[25]  Abhinav Gupta,et al.  The Curious Robot: Learning Visual Representations via Physical Interactions , 2016, ECCV.

[26]  Frédo Durand,et al.  Visual vibrometry: Estimating material properties from small motions in video , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).