Dynamics Learning With Object-Centric Interaction Networks for Robot Manipulation

Understanding the physical interactions of objects with environments is critical for multi-object robotic manipulation tasks. A predictive dynamics model can predict the future states of manipulated objects, which is used to plan plausible actions that enable the objects to achieve desired goal states. However, most current approaches on dynamics learning from high-dimensional visual observations have limitations. These methods either rely on a large amount of real-world data or build a model with a fixed number of objects, which makes them difficult to generalize to unseen objects. This paper proposes a Deep Object-centric Interaction Network (DOIN) which encodes object-centric representations for multiple objects from raw RGB images and reasons about the future trajectory for each object in latent space. The proposed model is trained only on large amounts of random interaction data collected in simulation. The learned model combined with a model predictive control framework enables a robot to search action sequences that manipulate objects to the desired configurations. The proposed method is evaluated both in simulation and real-world experiments on multi-object pushing tasks. Extensive simulation experiments show that DOIN can achieve high prediction accuracy in different scenes with different numbers of objects and outperform state-of-the-art baselines in the manipulation tasks. Real-world experiments demonstrate that the model trained on simulated data can be transferred to the real robot and can successfully perform multi-object pushing tasks for previously-unseen objects with significant variations in shape and size.

[1]  Wei Gao,et al.  kPAM: KeyPoint Affordances for Category-Level Robotic Manipulation , 2019, ISRR.

[2]  Sergey Levine,et al.  Stochastic Adversarial Video Prediction , 2018, ArXiv.

[3]  Ming Cong,et al.  A Reinforcement Learning-Based Framework for Robot Manipulation Skill Acquisition , 2020, IEEE Access.

[4]  Xiaoou Tang,et al.  Video Frame Synthesis Using Deep Voxel Flow , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[5]  Dieter Fox,et al.  Deep Object Pose Estimation for Semantic Robotic Grasping of Household Objects , 2018, CoRL.

[6]  Allan Jabri,et al.  Towards Practical Multi-Object Manipulation using Relational Reinforcement Learning , 2019, 2020 IEEE International Conference on Robotics and Automation (ICRA).

[7]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[8]  Sergey Levine,et al.  Visual Foresight: Model-Based Deep Reinforcement Learning for Vision-Based Robotic Control , 2018, ArXiv.

[9]  Yu Zhu,et al.  GRU-Type LARC Strategy for Precision Motion Control With Accurate Tracking Error Prediction , 2021, IEEE Transactions on Industrial Electronics.

[10]  Leonidas J. Guibas,et al.  ShapeNet: An Information-Rich 3D Model Repository , 2015, ArXiv.

[11]  Jitendra Malik,et al.  Learning to Poke by Poking: Experiential Learning of Intuitive Physics , 2016, NIPS.

[12]  Marwan Qaid Mohammed,et al.  Review of Deep Reinforcement Learning-Based Object Grasping: Techniques, Open Challenges, and Recommendations , 2020, IEEE Access.

[13]  Russ Tedrake,et al.  Dense Object Nets: Learning Dense Visual Object Descriptors By and For Robotic Manipulation , 2018, CoRL.

[14]  Sergey Levine,et al.  Deep visual foresight for planning robot motion , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[15]  Tamim Asfour,et al.  Predicting Pushing Action Effects on Spatial Object Relations by Learning Internal Prediction Models , 2020, 2020 IEEE International Conference on Robotics and Automation (ICRA).

[16]  Wei Quan,et al.  Manipulation Skill Acquisition for Robotic Assembly Based on Multi-Modal Information Description , 2020, IEEE Access.

[17]  P. Abbeel,et al.  Yale-CMU-Berkeley dataset for robotic manipulation research , 2017, Int. J. Robotics Res..

[18]  Ruben Villegas,et al.  Learning Latent Dynamics for Planning from Pixels , 2018, ICML.

[19]  Erkut Erdem,et al.  Object and Relation Centric Representations for Push Effect Prediction , 2021, ArXiv.

[20]  Chelsea Finn,et al.  Hierarchical Foresight: Self-Supervised Learning of Long-Horizon Tasks via Visual Subgoal Generation , 2019, ICLR.

[21]  Yu Zhu,et al.  Learning Semantic Keypoint Representations for Door Opening Manipulation , 2020, IEEE Robotics and Automation Letters.

[22]  Sergey Levine,et al.  Unsupervised Learning for Physical Interaction through Video Prediction , 2016, NIPS.

[23]  Jiajun Wu,et al.  Visual Dynamics: Probabilistic Future Frame Synthesis via Cross Convolutional Networks , 2016, NIPS.

[24]  David Hsu,et al.  Push-Net: Deep Planar Pushing for Objects with Unknown Physical Properties , 2018, Robotics: Science and Systems.

[25]  Razvan Pascanu,et al.  Visual Interaction Networks: Learning a Physics Simulator from Video , 2017, NIPS.

[26]  Abhinav Gupta,et al.  Object-centric Forward Modeling for Model Predictive Control , 2019, CoRL.

[27]  Sergey Levine,et al.  Reasoning About Physical Interactions with Object-Oriented Prediction and Planning , 2018, ICLR.

[28]  Sergey Levine,et al.  Robustness via Retrying: Closed-Loop Robotic Manipulation with Self-Supervised Learning , 2018, CoRL.

[29]  Yu Zhu,et al.  Deep GRU Neural Network Prediction and Feedforward Compensation for Precision Multiaxis Motion Control Systems , 2020, IEEE/ASME Transactions on Mechatronics.

[30]  Silvio Savarese,et al.  Dynamics Learning with Cascaded Variational Inference for Multi-Step Manipulation , 2019, CoRL.

[31]  Jakub W. Pachocki,et al.  Learning dexterous in-hand manipulation , 2018, Int. J. Robotics Res..

[32]  Jitendra Malik,et al.  Learning Visual Predictive Models of Physics for Playing Billiards , 2015, ICLR.

[33]  Russ Tedrake,et al.  Self-Supervised Correspondence in Visuomotor Policy Learning , 2019, IEEE Robotics and Automation Letters.

[34]  Jiajun Wu,et al.  DensePhysNet: Learning Dense Physical Object Representations via Multi-step Dynamic Interactions , 2019, Robotics: Science and Systems.

[35]  Marco Pavone,et al.  Robot Motion Planning in Learned Latent Spaces , 2018, IEEE Robotics and Automation Letters.

[36]  Jiajun Wu,et al.  Learning 3D Dynamic Scene Representations for Robot Manipulation , 2020, CoRL.

[37]  R. Rubinstein The Cross-Entropy Method for Combinatorial and Continuous Optimization , 1999 .

[38]  Li-Min Zhu,et al.  Intelligent Feedforward Compensation Motion Control of Maglev Planar Motor With Precise Reference Modification Prediction , 2021, IEEE Transactions on Industrial Electronics.