论文信息 - Bounce and Learn: Modeling Scene Dynamics with Real-World Bounces

Bounce and Learn: Modeling Scene Dynamics with Real-World Bounces

We introduce an approach to model surface properties governing bounces in everyday scenes. Our model learns end-to-end, starting from sensor inputs, to predict post-bounce trajectories and infer two underlying physical properties that govern bouncing - restitution and effective collision normals. Our model, Bounce and Learn, comprises two modules -- a Physics Inference Module (PIM) and a Visual Inference Module (VIM). VIM learns to infer physical parameters for locations in a scene given a single still image, while PIM learns to model physical interactions for the prediction task given physical parameters and observed pre-collision 3D trajectories. To achieve our results, we introduce the Bounce Dataset comprising 5K RGB-D videos of bouncing trajectories of a foam ball to probe surfaces of varying shapes and materials in everyday scenes including homes and offices. Our proposed model learns from our collected dataset of real-world bounces and is bootstrapped with additional information from simple physics simulations. We show on our newly collected dataset that our model out-performs baselines, including trajectory fitting with Newtonian physics, in predicting post-bounce trajectories and inferring physical properties of a scene.

[1] Martial Hebert,et al. An Uncertain Future: Forecasting from Static Images Using Variational Autoencoders , 2016, ECCV.

[2] Jiajun Wu,et al. Generative Modeling of Audible Shapes for Object Perception , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[3] David J. Fleet,et al. Estimating contact dynamics , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[4] Jiajun Wu,et al. Galileo: Perceiving Physical Object Properties by Integrating a Physics Engine with Deep Learning , 2015, NIPS.

[5] Rob Fergus,et al. Learning Physical Intuition of Block Towers by Example , 2016, ICML.

[6] Derek Hoiem,et al. Indoor Segmentation and Support Inference from RGBD Images , 2012, ECCV.

[7] Irfan A. Essa,et al. Leveraging Contextual Cues for Generating Basketball Highlights , 2016, ACM Multimedia.

[8] Scott Cohen,et al. Forecasting Human Dynamics from Static Images , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9] Ali Farhadi,et al. Visual Semantic Planning Using Deep Successor Representations , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[10] Abhinav Gupta,et al. Learning to fly by crashing , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[11] Steven M. Seitz,et al. Computing the Physical Parameters of Rigid-Body Motion from Video , 2002, ECCV.

[12] Razvan Pascanu,et al. Visual Interaction Networks: Learning a Physics Simulator from Video , 2017, NIPS.

[13] D. Stoianovici,et al. A Critical Study of the Applicability of Rigid-Body Collision Theory , 1996 .

[14] Jiajun Wu,et al. Physics 101: Learning Physical Object Properties from Unlabeled Videos , 2016, BMVC.

[15] A. Ruina,et al. A New Algebraic Rigid-Body Collision Law Based on Impulse Space Considerations , 1998 .

[16] Razvan Pascanu,et al. Interaction Networks for Learning about Objects, Relations and Physics , 2016, NIPS.

[17] Wojciech Matusik,et al. Dynamics-aware numerical coarsening for fabrication design , 2017, ACM Trans. Graph..

[18] T. R. Hughes,et al. Mathematical foundations of elasticity , 1982 .

[19] Ali Farhadi,et al. Target-driven visual navigation in indoor scenes using deep reinforcement learning , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[20] William B. Nordgren. Flexible simulation (Flexsim) software: Flexsim simulation environment , 2003, WSC '03.

[21] Chenfanfu Jiang,et al. Inferring Forces and Learning Human Utilities from Videos , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22] Dinesh K. Pai,et al. Bounce maps , 2017, ACM Trans. Graph..

[23] Antonis A. Argyros,et al. Binding Vision to Physics Based Simulation: The Case Study of a Bouncing Ball , 2011 .

[24] Abhinav Gupta,et al. Designing deep networks for surface normal estimation , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25] Abhinav Gupta,et al. Supersizing self-supervision: Learning to grasp from 50K tries and 700 robot hours , 2015, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[26] Jiajun Wu,et al. Visual Dynamics: Probabilistic Future Frame Synthesis via Cross Convolutional Networks , 2016, NIPS.

[27] W. Eric L. Grimson,et al. Adaptive background mixture models for real-time tracking , 1999, Proceedings. 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No PR00149).

[28] Andrew Owens,et al. Visually Indicated Sounds , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[29] Allan D. Jepson,et al. The Computational Perception of Scene Dynamics , 1997, Comput. Vis. Image Underst..

[30] David J. Fleet,et al. Physics-Based Person Tracking Using the Anthropomorphic Walker , 2010, International Journal of Computer Vision.

[31] Abhinav Gupta,et al. The Curious Robot: Learning Visual Representations via Physical Interactions , 2016, ECCV.

[32] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[33] D. Stewart. Dynamics with Inequalities: Impacts and Hard Constraints , 2011 .

[34] Sergey Levine,et al. End-to-End Training of Deep Visuomotor Policies , 2015, J. Mach. Learn. Res..

[35] Song-Chun Zhu,et al. Understanding tools: Task-oriented object modeling, learning and recognition , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[36] Hao Su,et al. A Point Set Generation Network for 3D Object Reconstruction from a Single Image , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[37] Niloy J. Mitra,et al. Learning A Physical Long-term Predictor , 2017, ArXiv.

[38] Noah Snavely,et al. OpenSurfaces , 2013, ACM Trans. Graph..

[39] Ali Farhadi,et al. Newtonian Image Understanding: Unfolding the Dynamics of Objects in Static Images , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[40] Ali Farhadi,et al. "What Happens If..." Learning to Predict the Effect of Forces in Images , 2016, ECCV.

[41] Kathrin Abendroth,et al. Nonlinear Finite Elements For Continua And Structures , 2016 .

[42] Alexei A. Efros,et al. Blocks World Revisited: Image Understanding Using Qualitative Geometry and Mechanics , 2010, ECCV.

[43] Joshua B. Tenenbaum,et al. A Compositional Object-Based Approach to Learning Physical Dynamics , 2016, ICLR.

[44] Jitendra Malik,et al. Learning to Poke by Poking: Experiential Learning of Intuitive Physics , 2016, NIPS.

[45] Antonio Torralba,et al. Generating the Future with Adversarial Transformers , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[46] Niloy J. Mitra,et al. SMASH: physics-guided reconstruction of collisions from videos , 2016, ACM Trans. Graph..

[47] S. Levine,et al. Predictive Visual Models of Physics for Playing Billiards , 2015 .

[48] David J. Fleet,et al. The Kneed Walker for human pose tracking , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[49] Niloy J. Mitra,et al. Learning to Represent Mechanics via Long-term Extrapolation and Interpolation , 2017, ArXiv.

[50] Robert C. Bolles,et al. Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography , 1981, CACM.

[51] Rob Fergus,et al. Predicting Depth, Surface Normals and Semantic Labels with a Common Multi-scale Convolutional Architecture , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[52] Abhinav Gupta,et al. Marr Revisited: 2D-3D Alignment via Surface Normal Prediction , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[53] Leonidas J. Guibas,et al. PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).