DensePhysNet: Learning Dense Physical Object Representations via Multi-step Dynamic Interactions

We study the problem of learning physical object representations for robot manipulation. Understanding object physics is critical for successful object manipulation, but also challenging because physical object properties can rarely be inferred from the object's static appearance. In this paper, we propose DensePhysNet, a system that actively executes a sequence of dynamic interactions (e.g., sliding and colliding), and uses a deep predictive model over its visual observations to learn dense, pixel-wise representations that reflect the physical properties of observed objects. Our experiments in both simulation and real settings demonstrate that the learned representations carry rich physical information, and can directly be used to decode physical object properties such as friction and mass. The use of dense representation enables DensePhysNet to generalize well to novel scenes with more objects than in training. With knowledge of object physics, the learned representation also leads to more accurate and efficient manipulation in downstream tasks than the state-of-the-art.

[1]  Eric Brachmann,et al.  Learning 6D Object Pose Estimation Using 3D Object Coordinates , 2014, ECCV.

[2]  Russ Tedrake,et al.  Dense Object Nets: Learning Dense Visual Object Descriptors By and For Robotic Manipulation , 2018, CoRL.

[3]  Sergey Levine,et al.  Deep visual foresight for planning robot motion , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[4]  David Hsu,et al.  Push-Net: Deep Planar Pushing for Objects with Unknown Physical Properties , 2018, Robotics: Science and Systems.

[5]  Showzow Tsujio,et al.  Estimation of object inertia parameters on robot pushing operation , 2004 .

[6]  Greg Turk,et al.  Preparing for the Unknown: Learning a Universal Policy with Online System Identification , 2017, Robotics: Science and Systems.

[7]  Abhinav Gupta,et al.  Interpretable Intuitive Physics Model , 2018, ECCV.

[8]  Jiajun Wu,et al.  Learning to See Physics via Visual De-animation , 2017, NIPS.

[9]  Abhinav Gupta,et al.  Environment Probing Interaction Policies , 2019, ICLR.

[10]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[11]  Misha Denil,et al.  Learning to Perform Physics Experiments via Deep Reinforcement Learning , 2016, ICLR.

[12]  Luca Antiga,et al.  Automatic differentiation in PyTorch , 2017 .

[13]  Niloy J. Mitra,et al.  Taking Visual Motion Prediction To New Heightfields , 2019, Comput. Vis. Image Underst..

[14]  Jitendra Malik,et al.  Learning Visual Predictive Models of Physics for Playing Billiards , 2015, ICLR.

[15]  Jiajun Wu,et al.  Visual Dynamics: Probabilistic Future Frame Synthesis via Cross Convolutional Networks , 2016, NIPS.

[16]  Dieter Fox,et al.  SE3-nets: Learning rigid body motion using deep neural networks , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[17]  Silvio Savarese,et al.  Universal Correspondence Network , 2016, NIPS.

[18]  Matthias Nießner,et al.  3DMatch: Learning the Matching of Local 3D Geometry in Range Scans , 2016, ArXiv.

[19]  Jiajun Wu,et al.  Unsupervised Learning of Latent Physical Properties Using Perception-Prediction Networks , 2018, UAI.

[20]  Jiajun Wu,et al.  Galileo: Perceiving Physical Object Properties by Integrating a Physics Engine with Deep Learning , 2015, NIPS.

[21]  Leonidas J. Guibas,et al.  ShapeNet: An Information-Rich 3D Model Repository , 2015, ArXiv.

[22]  Abhinav Gupta,et al.  Learning to push by grasping: Using multiple tasks for effective learning , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[23]  Razvan Pascanu,et al.  Visual Interaction Networks: Learning a Physics Simulator from Video , 2017, NIPS.

[24]  Dieter Fox,et al.  Self-Supervised Visual Descriptor Learning for Dense Correspondence , 2017, IEEE Robotics and Automation Letters.

[25]  Andrew W. Fitzgibbon,et al.  Scene Coordinate Regression Forests for Camera Relocalization in RGB-D Images , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[26]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Christopher G. Atkeson,et al.  Estimation of Inertial Parameters of Manipulator Loads and Links , 1986 .

[28]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[29]  Andrea Vedaldi,et al.  Unsupervised learning of object frames by dense equivariant image labelling , 2017, NIPS.

[30]  Abhinav Gupta,et al.  The Curious Robot: Learning Visual Representations via Physical Interactions , 2016, ECCV.

[31]  Takeo Kanade,et al.  Automated Construction of Robotic Manipulation Programs , 2010 .

[32]  Jitendra Malik,et al.  Learning to Poke by Poking: Experiential Learning of Intuitive Physics , 2016, NIPS.

[33]  Alberto Rodriguez,et al.  Learning Synergies Between Pushing and Grasping with Self-Supervised Deep Reinforcement Learning , 2018, 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).