gradSim: Differentiable simulation for system identification and visuomotor control

We consider the problem of estimating an object’s physical properties such as mass, friction, and elasticity directly from video sequences. Such a system identification problem is fundamentally ill-posed due to the loss of information during image formation. Current solutions require precise 3D labels which are labor-intensive to gather, and infeasible to create for many systems such as deformable solids or cloth. We present ∇Sim, a framework that overcomes the dependence on 3D supervision by leveraging differentiable multiphysics simulation and differentiable rendering to jointly model the evolution of scene dynamics and image formation. This novel combination enables backpropagation from pixels in a video sequence through to the underlying physical attributes that generated them. Moreover, our unified computation graph – spanning from the dynamics and through the rendering process – enables learning in challenging visuomotor control tasks, without relying on state-based (3D) supervision, while obtaining performance competitive to or better than techniques that rely on precise 3D labels.

[1]  Sanja Fidler,et al.  Image GANs meet Differentiable Rendering for Inverse Graphics and Interpretable 3D Neural Rendering , 2021, ICLR.

[2]  Jianyu Zhang,et al.  Symplectic Recurrent Neural Networks , 2020, ICLR.

[3]  Jernej Barbic,et al.  FEM simulation of 3D deformable solids: a practitioner's guide to theory, discretization and model reduction , 2012, SIGGRAPH '12.

[4]  Ronald Fedkiw,et al.  Simulation of clothing with folds and wrinkles , 2003, SCA '03.

[5]  Niloy J. Mitra,et al.  Unsupervised Intuitive Physics from Visual Observations , 2018, ACCV.

[6]  Wei Liu,et al.  Pixel2Mesh: Generating 3D Mesh Models from Single RGB Images , 2018, ECCV.

[7]  Steven M. Seitz,et al.  Computing the Physical Parameters of Rigid-Body Motion from Video , 2002, ECCV.

[8]  Niloy J. Mitra,et al.  Neural Re-Simulation for Generating Bounces in Single Images , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[9]  Wan-Yen Lo,et al.  Accelerating 3D deep learning with PyTorch3D , 2019, SIGGRAPH Asia 2020 Courses.

[10]  Michael Burke,et al.  Physics-as-Inverse-Graphics: Joint Unsupervised Learning of Objects and Physics from Video , 2019, ArXiv.

[11]  Jiancheng Liu,et al.  ChainQueen: A Real-Time Differentiable Physical Simulator for Soft Robotics , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[12]  Charles C. Margossian,et al.  A review of automatic differentiation and its efficient implementation , 2018, WIREs Data Mining Knowl. Discov..

[13]  Yuval Tassa,et al.  Simulation tools for model-based robotics: Comparison of Bullet, Havok, MuJoCo, ODE and PhysX , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[14]  Silvio Savarese,et al.  image2mass: Estimating the Mass of an Object from Its Image , 2017, CoRL.

[15]  Max Jaderberg,et al.  Unsupervised Learning of 3D Structure from Images , 2016, NIPS.

[16]  Pratul P. Srinivasan,et al.  NeRF , 2020, ECCV.

[17]  Nicolas Thome,et al.  Disentangling Physical Dynamics From Unknown Factors for Unsupervised Video Prediction , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Joshua B. Tenenbaum,et al.  A Compositional Object-Based Approach to Learning Physical Dynamics , 2016, ICLR.

[19]  Ronald J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[20]  Richard A. Newcombe,et al.  DeepSDF: Learning Continuous Signed Distance Functions for Shape Representation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Tae-Yong Kim,et al.  Unified particle physics for real-time applications , 2014, ACM Trans. Graph..

[22]  Andrew Jaegle,et al.  Hamiltonian Generative Networks , 2020, ICLR.

[23]  Tatsuya Harada,et al.  Neural 3D Mesh Renderer , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[24]  Yiyi Liao,et al.  Towards Unsupervised Learning of Generative Models for 3D Controllable Image Synthesis , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Sanja Fidler,et al.  Learning Deformable Tetrahedral Meshes for 3D Reconstruction , 2020, NeurIPS.

[26]  Joshua B. Tenenbaum,et al.  End-to-End Differentiable Physics for Learning and Control , 2018, NeurIPS.

[27]  Sai Kit Yeung,et al.  Fill and Transfer: A Simple Physics-Based Approach for Containability Reasoning , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[28]  Mathieu Aubry,et al.  AtlasNet: A Papier-M\^ach\'e Approach to Learning 3D Surface Generation , 2018, CVPR 2018.

[29]  Sebastian Nowozin,et al.  Occupancy Networks: Learning 3D Reconstruction in Function Space , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  C. Karen Liu,et al.  Learning physics-based motion style with nonlinear inverse optimization , 2005, ACM Trans. Graph..

[31]  Kartic Subr,et al.  Vid2Param: Modeling of Dynamics Parameters From Video , 2020, IEEE Robotics and Automation Letters.

[32]  Willie Neiswanger,et al.  Neural Dynamical Systems: Balancing Structure and Flexibility in Physical Prediction , 2020, ArXiv.

[33]  Anders P. Eriksson,et al.  Implicit Surface Representations As Layers in Neural Networks , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[34]  Allan D. Jepson,et al.  Computational Perception of Scene Dynamics , 1996, ECCV.

[35]  Hao Zhang,et al.  Learning Implicit Fields for Generative Shape Modeling , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[36]  Jiajun Wu,et al.  Learning to See Physics via Visual De-animation , 2017, NIPS.

[37]  Niloy J. Mitra,et al.  Learning A Physical Long-term Predictor , 2017, ArXiv.

[38]  Vladlen Koltun,et al.  Open3D: A Modern Library for 3D Data Processing , 2018, ArXiv.

[39]  David J. Murray-Smith The inverse simulation approach: a focused review of methods and applications , 2000 .

[40]  Jiajun Wu,et al.  Learning Particle Dynamics for Manipulating Rigid Bodies, Deformable Objects, and Fluids , 2018, ICLR.

[41]  Johannes Willkomm,et al.  Introduction to Automatic Differentiation , 2009 .

[42]  Ravi Ramamoorthi,et al.  Local Light Field Fusion: Practical View Synthesis with Prescriptive Sampling Guidelines , 2019 .

[43]  Kostas E. Bekris,et al.  A First Principles Approach for Data-Efficient System Identification of Spring-Rod Systems via Differentiable Physics Engines , 2020, L4DC.

[44]  Joshua B. Tenenbaum,et al.  Causal and compositional generative models in online perception , 2017, CogSci.

[45]  Pieter Abbeel,et al.  InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets , 2016, NIPS.

[46]  Amit Chakraborty,et al.  Symplectic ODE-Net: Learning Hamiltonian Dynamics with Control , 2020, ICLR.

[47]  Dan Moldovan,et al.  Tangent: Automatic Differentiation Using Source Code Transformation in Python , 2017, ArXiv.

[48]  Jos Stam,et al.  Stable fluids , 1999, SIGGRAPH.

[49]  Yong-Liang Yang,et al.  RenderNet: A deep convolutional network for differentiable rendering from 3D shapes , 2018, NeurIPS.

[50]  Geoffrey E. Hinton,et al.  NeuroAnimator: fast neural network emulation and control of physics-based models , 1998, SIGGRAPH.

[51]  Duygu Ceylan,et al.  DISN: Deep Implicit Surface Network for High-quality Single-view 3D Reconstruction , 2019, NeurIPS.

[52]  Austin Wang,et al.  Encoding Physical Constraints in Differentiable Newton-Euler Algorithm , 2020, L4DC.

[53]  Daniel L. K. Yamins,et al.  Visual Grounding of Learned Physical Models , 2020, ICML.

[54]  Tobias Ritschel,et al.  Escaping Plato's Cave using Adversarial Training: 3D Shape From Unstructured 2D Image Collections , 2018, ArXiv.

[55]  Bin Wang,et al.  Neural Material: Learning Elastic Constitutive Material and Damping Models from Sparse Data , 2018, ArXiv.

[56]  Connor Schenck,et al.  SPNets: Differentiable Fluid Dynamics for Deep Neural Networks , 2018, CoRL.

[57]  David Duvenaud,et al.  Neural Ordinary Differential Equations , 2018, NeurIPS.

[58]  Jessica B. Hamrick,et al.  Simulation as an engine of physical scene understanding , 2013, Proceedings of the National Academy of Sciences.

[59]  Andreas Geiger,et al.  Differentiable Volumetric Rendering: Learning Implicit 3D Representations Without 3D Supervision , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[60]  Dieter Fox,et al.  BayesSim: adaptive domain randomization via probabilistic inference for robotics simulators , 2019, Robotics: Science and Systems.

[61]  Ming C. Lin,et al.  Differentiable Cloth Simulation for Inverse Problems , 2019, NeurIPS.

[62]  Ali Farhadi,et al.  "What Happens If..." Learning to Predict the Effect of Forces in Images , 2016, ECCV.

[63]  Sanja Fidler,et al.  Kaolin: A PyTorch Library for Accelerating 3D Deep Learning Research , 2019, ArXiv.

[64]  Gilles Louppe,et al.  The frontier of simulation-based inference , 2020, Proceedings of the National Academy of Sciences.

[65]  Jason Yosinski,et al.  Hamiltonian Neural Networks , 2019, NeurIPS.

[66]  Gaurav S. Sukhatme,et al.  Interactive Differentiable Simulation , 2019, ArXiv.

[67]  Jean-Jacques E. Slotine,et al.  Linear Matrix Inequalities for Physically Consistent Inertial Parameter Identification: A Statistical Perspective on the Mass Distribution , 2017, IEEE Robotics and Automation Letters.

[68]  Dieter Fox,et al.  SE3-nets: Learning rigid body motion using deep neural networks , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[69]  Greg Humphreys,et al.  Physically Based Rendering: From Theory to Implementation , 2004 .

[70]  Jitendra Malik,et al.  Learning Visual Predictive Models of Physics for Playing Billiards , 2015, ICLR.

[71]  Jiajun Wu,et al.  Physics 101: Learning Physical Object Properties from Unlabeled Videos , 2016, BMVC.

[72]  Jessica K. Hodgins,et al.  Estimating cloth simulation parameters from video , 2003, SCA '03.

[73]  Raquel Urtasun,et al.  Physically-based motion models for 3D tracking: A convex formulation , 2011, 2011 International Conference on Computer Vision.

[74]  David Meger,et al.  GEOMetrics: Exploiting Geometric Structure for Graph-Encoded Objects , 2019, ICML.

[75]  Michael J. Black,et al.  OpenDR: An Approximate Differentiable Renderer , 2014, ECCV.

[76]  Luc Van Gool,et al.  RayNet: Learning Volumetric 3D Reconstruction with Ray Potentials , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[77]  Jiajun Wu,et al.  DensePhysNet: Learning Dense Physical Object Representations via Multi-step Dynamic Interactions , 2019, Robotics: Science and Systems.

[78]  Charles T. Loop,et al.  Neural Geometric Level of Detail: Real-time Rendering with Implicit 3D Shapes , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[79]  Ali Farhadi,et al.  Use the Force, Luke! Learning to Predict Physical Forces by Simulating Effects , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[80]  Koray Kavukcuoglu,et al.  Neural scene representation and rendering , 2018, Science.

[81]  Theodore Kim,et al.  Stable Neo-Hookean Flesh Simulation , 2018, ACM Trans. Graph..

[82]  Krzysztof Kozłowski,et al.  Modelling and Identification in Robotics , 1998 .

[83]  J. Tenenbaum,et al.  Efficient analysis-by-synthesis in vision : A computational framework , behavioral tests , and comparison with neural representations , 2015 .

[84]  Jiajun Wu,et al.  Neural Scene De-rendering , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[85]  David Meger,et al.  Multi-View Silhouette and Depth Decomposition for High Resolution 3D Object Representation , 2018, NeurIPS.

[86]  Marc Toussaint,et al.  Differentiable Physics and Stable Modes for Tool-Use and Manipulation Planning , 2018, Robotics: Science and Systems.

[87]  Jiajun Wu,et al.  Galileo: Perceiving Physical Object Properties by Integrating a Physics Engine with Deep Learning , 2015, NIPS.

[88]  Andreas Geiger,et al.  Geometric Image Synthesis , 2018, ACCV.

[89]  Leonidas J. Guibas,et al.  ShapeNet: An Information-Rich 3D Model Repository , 2015, ArXiv.

[90]  Ali Farhadi,et al.  Newtonian Image Understanding: Unfolding the Dynamics of Objects in Static Images , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[91]  Kyle Cranmer,et al.  Hamiltonian Graph Networks with ODE Integrators , 2019, ArXiv.

[92]  David J. Fleet,et al.  Physics-Based Person Tracking Using the Anthropomorphic Walker , 2010, International Journal of Computer Vision.

[93]  David Kirk,et al.  NVIDIA cuda software and gpu parallel computing architecture , 2007, ISMM '07.

[94]  Jaakko Lehtinen,et al.  Learning to Predict 3D Objects with an Interpolation-based Differentiable Renderer , 2019, NeurIPS.

[95]  Joshua B. Tenenbaum,et al.  Picture: A probabilistic programming language for scene perception , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[96]  Emanuel Todorov,et al.  Convex and analytically-invertible dynamics with contacts and constraints: Theory and implementation in MuJoCo , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[97]  Alexei A. Efros,et al.  Blocks World Revisited: Image Understanding Using Qualitative Geometry and Mechanics , 2010, ECCV.

[98]  Andre Pradhana,et al.  A moving least squares material point method with displacement discontinuity and two-way rigid body coupling , 2018, ACM Trans. Graph..

[99]  Frédo Durand,et al.  DiffTaichi: Differentiable Programming for Physical Simulation , 2020, ICLR.

[100]  Ming C. Lin,et al.  Scalable Differentiable Physics for Learning and Control , 2020, ICML.

[101]  David J. Fleet,et al.  Estimating contact dynamics , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[102]  Quoc V. Le,et al.  EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks , 2019, ICML.

[103]  Joshua B. Tenenbaum,et al.  Efficient inverse graphics in biological face processing , 2018, Science Advances.

[104]  Jaakko Lehtinen,et al.  Differentiable Monte Carlo ray tracing through edge sampling , 2018, ACM Trans. Graph..

[105]  Hao Li,et al.  Soft Rasterizer: A Differentiable Renderer for Image-Based 3D Reasoning , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[106]  Abdeslam Boularias,et al.  Identifying Mechanical Models through Differentiable Simulations , 2020, ArXiv.

[107]  Natalia Gimelshein,et al.  PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[108]  Xinyu Liu,et al.  Dex-Net 2.0: Deep Learning to Plan Robust Grasps with Synthetic Point Clouds and Analytic Grasp Metrics , 2017, Robotics: Science and Systems.

[109]  Razvan Pascanu,et al.  Visual Interaction Networks: Learning a Physics Simulator from Video , 2017, NIPS.

[110]  Sergey Levine,et al.  Reasoning About Physical Interactions with Object-Oriented Prediction and Planning , 2018, ICLR.

[111]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[112]  Abdeslam Boularias,et al.  Learning to Slide Unknown Objects with Differentiable Physics Simulations , 2020, Robotics: Science and Systems.

[113]  Miles Cranmer,et al.  Lagrangian Neural Networks , 2020, ICLR 2020.

[114]  Jonas Degrave,et al.  A DIFFERENTIABLE PHYSICS ENGINE FOR DEEP LEARNING IN ROBOTICS , 2016, Front. Neurorobot..

[115]  Jitendra Malik,et al.  Learning to Poke by Poking: Experiential Learning of Intuitive Physics , 2016, NIPS.