论文信息 - Making Sense of Vision and Touch: Self-Supervised Learning of Multimodal Representations for Contact-Rich Tasks

Making Sense of Vision and Touch: Self-Supervised Learning of Multimodal Representations for Contact-Rich Tasks

Contact-rich manipulation tasks in unstructured environments often require both haptic and visual feedback. However, it is non-trivial to manually design a robot controller that combines modalities with very different characteristics. While deep reinforcement learning has shown success in learning control policies for high-dimensional inputs, these algorithms are generally intractable to deploy on real robots due to sample complexity. We use self-supervision to learn a compact and multimodal representation of our sensory inputs, which can then be used to improve the sample efficiency of our policy learning. We evaluate our method on a peg insertion task, generalizing over different geometry, configurations, and clearances, while being robust to external perturbations. We present results in simulation and on a real robot.

[1] Daniel E. Whitney,et al. Quasi-Static Assembly of Compliantly Supported Rigid Parts , 1982 .

[2] Daniel E. Whitney,et al. Historical Perspective and State of the Art in Robot Force Control , 1985, Proceedings. 1985 IEEE International Conference on Robotics and Automation.

[3] G. Edelman. Neural Darwinism: The Theory Of Neuronal Group Selection , 1989 .

[4] Antonio Bicchi,et al. Integrated Tactile Sensing for Gripper Fingers , 1988 .

[5] Warren P. Seering,et al. Assembly strategies for chamferless parts , 1989, Proceedings, 1989 International Conference on Robotics and Automation.

[6] Oussama Khatib,et al. Inertial Properties in Robotic Manipulation: An Object-Level Framework , 1995, Int. J. Robotics Res..

[7] L. Sentis,et al. The CHAI Libraries , 2003 .

[8] Kenith V. Sobel,et al. Neural Synergy Between Kinetic Vision and Touch , 2004, Psychological science.

[9] Danica Kragic,et al. Learning tactile characterizations of object- and pose-specific grasps , 2011, 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[10] Sachin Chitta,et al. Human-Inspired Robotic Grasp Control With Tactile Sensing , 2011, IEEE Transactions on Robotics.

[11] Juhan Nam,et al. Multimodal Deep Learning , 2011, ICML.

[12] Stefan Schaal,et al. Learning force control policies for compliant manipulation , 2011, 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[13] Nitish Srivastava,et al. Multimodal learning with deep Boltzmann machines , 2012, J. Mach. Learn. Res..

[14] Danica Kragic,et al. A probabilistic framework for task-oriented grasp stability assessment , 2013, 2013 IEEE International Conference on Robotics and Automation.

[15] Oussama Khatib,et al. A Framework for Real-Time Multi-Contact Multi-Body Dynamic Simulation , 2013, ISRR.

[16] Connor Schenck,et al. Learning relational object categories using behavioral exploration and multimodal perception , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[17] Russ Tedrake,et al. A direct method for trajectory optimization of rigid bodies through contact , 2014, Int. J. Robotics Res..

[18] Jae-Bok Song,et al. Automated guidance of peg-in-hole assembly tasks for complex-shaped parts , 2014, 2014 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[19] Gaurav S. Sukhatme,et al. An autonomous manipulation system based on force control and optimization , 2014, Auton. Robots.

[20] Oliver Brock,et al. Exploitation of environmental constraints in human and robotic grasping , 2015, Int. J. Robotics Res..

[21] Stefan Schaal,et al. Data-Driven Online Decision Making for Autonomous Manipulation , 2015, Robotics: Science and Systems.

[22] Gaurav S. Sukhatme,et al. Force estimation and slip detection/classification for grip control using a biomimetic tactile sensor , 2015, 2015 IEEE-RAS 15th International Conference on Humanoid Robots (Humanoids).

[23] Simon Lacey,et al. Crossmodal and multisensory interactions between vision and touch , 2015, Scholarpedia.

[24] Jan Peters,et al. Stabilizing novel objects by learning to predict tactile slip , 2015, 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[25] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.

[26] Jan Peters,et al. Learning robot in-hand manipulation with tactile features , 2015, 2015 IEEE-RAS 15th International Conference on Humanoid Robots (Humanoids).

[27] Thomas Brox,et al. FlowNet: Learning Optical Flow with Convolutional Networks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[28] Jimmy A. Jørgensen,et al. Adaptation of manipulation skills in physical contact with the environment to reference force profiles , 2015, Auton. Robots.

[29] Honglak Lee,et al. Action-Conditional Video Prediction using Deep Networks in Atari Games , 2015, NIPS.

[30] Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.

[31] Jan Peters,et al. Stable reinforcement learning with autoencoders for tactile and visual data , 2016, 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[32] Yang Gao,et al. Deep learning for tactile understanding from visual and haptic data , 2015, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[33] Guigang Zhang,et al. Deep Learning , 2016, Int. J. Semantic Comput..

[34] Sergey Levine,et al. One-shot learning of manipulation skills with online dynamics adaptation and neural network priors , 2015, 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[35] Alexander Herzog,et al. A convex model of humanoid momentum dynamics for multi-contact motion generation , 2016, 2016 IEEE-RAS 16th International Conference on Humanoid Robots (Humanoids).

[36] Jitendra Malik,et al. Learning to Poke by Poking: Experiential Learning of Intuitive Physics , 2016, NIPS.

[37] Sergey Levine,et al. End-to-End Training of Deep Visuomotor Policies , 2015, J. Mach. Learn. Res..

[38] Heiga Zen,et al. WaveNet: A Generative Model for Raw Audio , 2016, SSW.

[39] Jiebo Luo,et al. Deep Multimodal Representation Learning from Temporal Data , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[40] Gorjan Alagic,et al. #p , 2019, Quantum information & computation.

[41] Oliver Brock,et al. Interactive Perception: Leveraging Action in Perception and Perception in Action , 2016, IEEE Transactions on Robotics.

[42] Stefan Schaal,et al. Probabilistic Articulated Real-Time Tracking for Robot Manipulation , 2016, IEEE Robotics and Automation Letters.

[43] Sergey Levine,et al. Path integral guided policy search , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[44] Nima Fazeli,et al. Fundamental Limitations in Performance and Interpretability of Common Planar Rigid-Body Contact Models , 2017, ISRR.

[45] Sergey Levine,et al. Deep visual foresight for planning robot motion , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[46] Manuela M. Veloso,et al. Learning End-to-end Multimodal Sensor Policies for Autonomous Navigation , 2017, CoRL.

[47] John Kenneth Salisbury,et al. Learning to represent haptic feedback for partially-observable tasks , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[48] Andrew Owens,et al. The Feeling of Success: Does Touch Sensing Help Predict Grasp Outcomes? , 2017, CoRL.

[49] Marcin Andrychowicz,et al. Sim-to-Real Transfer of Robotic Control with Dynamics Randomization , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[50] Andrew Owens,et al. Audio-Visual Scene Analysis with Self-Supervised Multisensory Features , 2018, ECCV.

[51] Jitendra Malik,et al. More Than a Feeling: Learning to Grasp and Regrasp Using Vision and Touch , 2018, IEEE Robotics and Automation Letters.

[52] Sergey Levine,et al. Stochastic Variational Video Prediction , 2017, ICLR.

[53] Sergey Levine,et al. Using Simulation and Domain Adaptation to Improve Efficiency of Deep Robotic Grasping , 2018, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[54] Oussama Khatib,et al. Experimental Studies of Contact Space Model for Multi-surface Collisions in Articulated Rigid-Body Systems , 2018, ISER.

[55] Karl Tuyls,et al. Integrating State Representation Learning Into Deep Reinforcement Learning , 2018, IEEE Robotics and Automation Letters.

[56] Chonhyon Park,et al. An Efficient Acyclic Contact Planner for Multiped Robots , 2018, IEEE Transactions on Robotics.

[57] Nando de Freitas,et al. Reinforcement and Imitation Learning for Diverse Visuomotor Skills , 2018, Robotics: Science and Systems.

[58] David Filliat,et al. State Representation Learning for Control: An Overview , 2018, Neural Networks.

[59] Jakub W. Pachocki,et al. Learning dexterous in-hand manipulation , 2018, Int. J. Robotics Res..