Bounce and Learn: Modeling Scene Dynamics with Real-World Bounces

We introduce an approach to model surface properties governing bounces in everyday scenes. Our model learns end-to-end, starting from sensor inputs, to predict post-bounce trajectories and infer two underlying physical properties that govern bouncing - restitution and effective collision normals. Our model, Bounce and Learn, comprises two modules -- a Physics Inference Module (PIM) and a Visual Inference Module (VIM). VIM learns to infer physical parameters for locations in a scene given a single still image, while PIM learns to model physical interactions for the prediction task given physical parameters and observed pre-collision 3D trajectories. To achieve our results, we introduce the Bounce Dataset comprising 5K RGB-D videos of bouncing trajectories of a foam ball to probe surfaces of varying shapes and materials in everyday scenes including homes and offices. Our proposed model learns from our collected dataset of real-world bounces and is bootstrapped with additional information from simple physics simulations. We show on our newly collected dataset that our model out-performs baselines, including trajectory fitting with Newtonian physics, in predicting post-bounce trajectories and inferring physical properties of a scene.

[1]  Martial Hebert,et al.  An Uncertain Future: Forecasting from Static Images Using Variational Autoencoders , 2016, ECCV.

[2]  Jiajun Wu,et al.  Generative Modeling of Audible Shapes for Object Perception , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[3]  David J. Fleet,et al.  Estimating contact dynamics , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[4]  Jiajun Wu,et al.  Galileo: Perceiving Physical Object Properties by Integrating a Physics Engine with Deep Learning , 2015, NIPS.

[5]  Rob Fergus,et al.  Learning Physical Intuition of Block Towers by Example , 2016, ICML.

[6]  Derek Hoiem,et al.  Indoor Segmentation and Support Inference from RGBD Images , 2012, ECCV.

[7]  Irfan A. Essa,et al.  Leveraging Contextual Cues for Generating Basketball Highlights , 2016, ACM Multimedia.

[8]  Scott Cohen,et al.  Forecasting Human Dynamics from Static Images , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Ali Farhadi,et al.  Visual Semantic Planning Using Deep Successor Representations , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[10]  Abhinav Gupta,et al.  Learning to fly by crashing , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[11]  Steven M. Seitz,et al.  Computing the Physical Parameters of Rigid-Body Motion from Video , 2002, ECCV.

[12]  Razvan Pascanu,et al.  Visual Interaction Networks: Learning a Physics Simulator from Video , 2017, NIPS.

[13]  D. Stoianovici,et al.  A Critical Study of the Applicability of Rigid-Body Collision Theory , 1996 .

[14]  Jiajun Wu,et al.  Physics 101: Learning Physical Object Properties from Unlabeled Videos , 2016, BMVC.

[15]  A. Ruina,et al.  A New Algebraic Rigid-Body Collision Law Based on Impulse Space Considerations , 1998 .

[16]  Razvan Pascanu,et al.  Interaction Networks for Learning about Objects, Relations and Physics , 2016, NIPS.

[17]  Wojciech Matusik,et al.  Dynamics-aware numerical coarsening for fabrication design , 2017, ACM Trans. Graph..

[18]  T. R. Hughes,et al.  Mathematical foundations of elasticity , 1982 .

[19]  Ali Farhadi,et al.  Target-driven visual navigation in indoor scenes using deep reinforcement learning , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[20]  William B. Nordgren Flexible simulation (Flexsim) software: Flexsim simulation environment , 2003, WSC '03.

[21]  Chenfanfu Jiang,et al.  Inferring Forces and Learning Human Utilities from Videos , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Dinesh K. Pai,et al.  Bounce maps , 2017, ACM Trans. Graph..

[23]  Antonis A. Argyros,et al.  Binding Vision to Physics Based Simulation: The Case Study of a Bouncing Ball , 2011 .

[24]  Abhinav Gupta,et al.  Designing deep networks for surface normal estimation , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Abhinav Gupta,et al.  Supersizing self-supervision: Learning to grasp from 50K tries and 700 robot hours , 2015, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[26]  Jiajun Wu,et al.  Visual Dynamics: Probabilistic Future Frame Synthesis via Cross Convolutional Networks , 2016, NIPS.

[27]  W. Eric L. Grimson,et al.  Adaptive background mixture models for real-time tracking , 1999, Proceedings. 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No PR00149).

[28]  Andrew Owens,et al.  Visually Indicated Sounds , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Allan D. Jepson,et al.  The Computational Perception of Scene Dynamics , 1997, Comput. Vis. Image Underst..

[30]  David J. Fleet,et al.  Physics-Based Person Tracking Using the Anthropomorphic Walker , 2010, International Journal of Computer Vision.

[31]  Abhinav Gupta,et al.  The Curious Robot: Learning Visual Representations via Physical Interactions , 2016, ECCV.

[32]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[33]  D. Stewart Dynamics with Inequalities: Impacts and Hard Constraints , 2011 .

[34]  Sergey Levine,et al.  End-to-End Training of Deep Visuomotor Policies , 2015, J. Mach. Learn. Res..

[35]  Song-Chun Zhu,et al.  Understanding tools: Task-oriented object modeling, learning and recognition , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[36]  Hao Su,et al.  A Point Set Generation Network for 3D Object Reconstruction from a Single Image , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[37]  Niloy J. Mitra,et al.  Learning A Physical Long-term Predictor , 2017, ArXiv.

[38]  Noah Snavely,et al.  OpenSurfaces , 2013, ACM Trans. Graph..

[39]  Ali Farhadi,et al.  Newtonian Image Understanding: Unfolding the Dynamics of Objects in Static Images , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[40]  Ali Farhadi,et al.  "What Happens If..." Learning to Predict the Effect of Forces in Images , 2016, ECCV.

[41]  Kathrin Abendroth,et al.  Nonlinear Finite Elements For Continua And Structures , 2016 .

[42]  Alexei A. Efros,et al.  Blocks World Revisited: Image Understanding Using Qualitative Geometry and Mechanics , 2010, ECCV.

[43]  Joshua B. Tenenbaum,et al.  A Compositional Object-Based Approach to Learning Physical Dynamics , 2016, ICLR.

[44]  Jitendra Malik,et al.  Learning to Poke by Poking: Experiential Learning of Intuitive Physics , 2016, NIPS.

[45]  Antonio Torralba,et al.  Generating the Future with Adversarial Transformers , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[46]  Niloy J. Mitra,et al.  SMASH: physics-guided reconstruction of collisions from videos , 2016, ACM Trans. Graph..

[47]  S. Levine,et al.  Predictive Visual Models of Physics for Playing Billiards , 2015 .

[48]  David J. Fleet,et al.  The Kneed Walker for human pose tracking , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[49]  Niloy J. Mitra,et al.  Learning to Represent Mechanics via Long-term Extrapolation and Interpolation , 2017, ArXiv.

[50]  Robert C. Bolles,et al.  Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography , 1981, CACM.

[51]  Rob Fergus,et al.  Predicting Depth, Surface Normals and Semantic Labels with a Common Multi-scale Convolutional Architecture , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[52]  Abhinav Gupta,et al.  Marr Revisited: 2D-3D Alignment via Surface Normal Prediction , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[53]  Leonidas J. Guibas,et al.  PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).