Dexterity from Touch: Self-Supervised Pre-Training of Tactile Representations with Robotic Play

Teaching dexterity to multi-fingered robots has been a longstanding challenge in robotics. Most prominent work in this area focuses on learning controllers or policies that either operate on visual observations or state estimates derived from vision. However, such methods perform poorly on fine-grained manipulation tasks that require reasoning about contact forces or about objects occluded by the hand itself. In this work, we present T-Dex, a new approach for tactile-based dexterity, that operates in two phases. In the first phase, we collect 2.5 hours of play data, which is used to train self-supervised tactile encoders. This is necessary to bring high-dimensional tactile readings to a lower-dimensional embedding. In the second phase, given a handful of demonstrations for a dexterous task, we learn non-parametric policies that combine the tactile observations with visual ones. Across five challenging dexterous tasks, we show that our tactile-based dexterity models outperform purely vision and torque-based models by an average of 1.7X. Finally, we provide a detailed analysis on factors critical to T-Dex including the importance of play data, architectures, and representation learning.

[1]  K. Kashino,et al.  BYOL for Audio: Exploring Pre-Trained General-Purpose Audio Representations , 2022, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[2]  Yashraj S. Narang,et al.  DeXtreme: Transfer of Agile In-hand Manipulation from Simulation to Reality , 2022, 2023 IEEE International Conference on Robotics and Automation (ICRA).

[3]  P. Stone,et al.  VIOLA: Imitation Learning for Vision-Based Manipulation with Object Proposal Priors , 2022, ArXiv.

[4]  Lerrel Pinto,et al.  From Play to Policy: Conditional Behavior Generation from Uncurated Robot Data , 2022, ICLR.

[5]  Sridhar Pandian Arunachalam,et al.  Holo-Dex: Teaching Dexterity with Immersive Mixed Reality , 2022, 2023 IEEE International Conference on Robotics and Automation (ICRA).

[6]  P. Abbeel,et al.  Real-World Robot Learning with Masked Visual Pre-training , 2022, CoRL.

[7]  Stephen Tu,et al.  Visual Backtracking Teleoperation: A Data Collection Protocol for Offline Image-Based Reinforcement Learning , 2022, ArXiv.

[8]  Ken Goldberg,et al.  Learning Self-Supervised Representations from Vision and Touch for Active Sliding Perception of Deformable Surfaces , 2022, ArXiv.

[9]  Lerrel Pinto,et al.  Behavior Transformers: Cloning k modes with one stone , 2022, NeurIPS.

[10]  B. Bäuml,et al.  Learning Purely Tactile In-Hand Manipulation with a Torque-Controlled Hand , 2022, 2022 International Conference on Robotics and Automation (ICRA).

[11]  Sridhar Pandian Arunachalam,et al.  Dexterous Imitation Made Easy: A Learning-Based Framework for Efficient Dexterous Manipulation , 2022, 2023 IEEE International Conference on Robotics and Automation (ICRA).

[12]  Vikash Kumar,et al.  R3M: A Universal Visual Representation for Robot Manipulation , 2022, CoRL.

[13]  Robert W. Platt,et al.  Tactile Pose Estimation and Policy Learning for Unknown Object Manipulation , 2022, AAMAS.

[14]  R. Rodrigo,et al.  CrossPoint: Self-Supervised Cross-Modal Contrastive Learning for 3D Point Cloud Understanding , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  P. Abbeel,et al.  Don't Change the Algorithm, Change the Data: Exploratory Data for Offline Reinforcement Learning , 2022, ArXiv.

[16]  Lerrel Pinto,et al.  The Surprising Effectiveness of Representation Learning for Visual Imitation , 2021, Robotics: Science and Systems.

[17]  Timothy M. Hospedales,et al.  Self-Supervised Representation Learning: Introduction, advances, and challenges , 2021, IEEE Signal Processing Magazine.

[18]  Pieter Abbeel,et al.  Playful Interactions for Representation Learning , 2021, 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[19]  Yann LeCun,et al.  VICReg: Variance-Invariance-Covariance Regularization for Self-Supervised Learning , 2021, ICLR.

[20]  P. Abbeel,et al.  Train Offline, Test Online: A Real Robot Learning Benchmark , 2023, 2023 IEEE International Conference on Robotics and Automation (ICRA).

[21]  Pieter Abbeel,et al.  Generalization in Dexterous Manipulation via Geometry-Aware Multi-Task Learning , 2021, ArXiv.

[22]  Pulkit Agrawal,et al.  A System for General In-Hand Object Re-Orientation , 2021, CoRL.

[23]  Raunaq M. Bhirangi,et al.  ReSkin: versatile, replaceable, lasting tactile skins , 2021, CoRL.

[24]  Jonathan Tompson,et al.  Implicit Behavioral Cloning , 2021, CoRL.

[25]  Fuchun Sun,et al.  Elastic Tactile Simulation Towards Tactile-Visual Perception , 2021, ACM Multimedia.

[26]  Silvio Savarese,et al.  What Matters in Learning from Offline Human Demonstrations for Robot Manipulation , 2021, CoRL.

[27]  Yu She,et al.  GelSight Wedge: Measuring High-Resolution 3D Contact Geometry with a Compact Robot Finger , 2021, 2021 IEEE International Conference on Robotics and Automation (ICRA).

[28]  Alessandro Lazaric,et al.  Reinforcement Learning with Prototypical Representations , 2021, ICML.

[29]  E. Adelson,et al.  Digger Finger: GelSight Tactile Sensor for Object Identification Inside Granular Media , 2021, ISER.

[30]  Martina Zambelli,et al.  Learning rich touch representations through cross-modal self-supervision , 2021, CoRL.

[31]  Pieter Abbeel,et al.  A Framework for Efficient Robotic Manipulation , 2020, ArXiv.

[32]  Silvio Savarese,et al.  Robust Policies via Mid-Level Visual Representations: An Experimental Study in Manipulation and Navigation , 2020, CoRL.

[33]  Pieter Abbeel,et al.  Visual Imitation Made Easy , 2020, CoRL.

[34]  Julien Mairal,et al.  Unsupervised Learning of Visual Features by Contrasting Cluster Assignments , 2020, NeurIPS.

[35]  Pierre H. Richemond,et al.  Bootstrap Your Own Latent: A New Approach to Self-Supervised Learning , 2020, NeurIPS.

[36]  Hyosang Lee,et al.  Calibrating a Soft ERT-Based Tactile Sensor with a Multiphysics Model and Sim-to-real Transfer Learning , 2020, 2020 IEEE International Conference on Robotics and Automation (ICRA).

[37]  Pieter Abbeel,et al.  CURL: Contrastive Unsupervised Representations for Reinforcement Learning , 2020, ICML.

[38]  Geoffrey E. Hinton,et al.  A Simple Framework for Contrastive Learning of Visual Representations , 2020, ICML.

[39]  Andy Zeng,et al.  Grasping in the Wild: Learning 6DoF Closed-Loop Grasping From Low-Cost Demonstrations , 2019, IEEE Robotics and Automation Letters.

[40]  Dieter Fox,et al.  DexPilot: Vision-Based Teleoperation of Dexterous Robotic Hand-Arm System , 2019, 2020 IEEE International Conference on Robotics and Automation (ICRA).

[41]  E. Adelson,et al.  Cable manipulation with a tactile-reactive gripper , 2019, Robotics: Science and Systems.

[42]  Silvio Savarese,et al.  Making Sense of Vision and Touch: Learning Multimodal Representations for Contact-Rich Tasks , 2019, IEEE Transactions on Robotics.

[43]  Jakub W. Pachocki,et al.  Learning dexterous in-hand manipulation , 2018, Int. J. Robotics Res..

[44]  Marcin Andrychowicz,et al.  Solving Rubik's Cube with a Robot Hand , 2019, ArXiv.

[45]  Vincent Hayward,et al.  Large-Area Soft e-Skin: The Challenges Beyond Sensor Designs , 2019, Proceedings of the IEEE.

[46]  Sergey Levine,et al.  Deep Dynamics Models for Learning Dexterous Manipulation , 2019, CoRL.

[47]  Russ Tedrake,et al.  Soft-bubble: A highly compliant dense geometry tactile sensor for robot manipulation , 2019, 2019 2nd IEEE International Conference on Soft Robotics (RoboSoft).

[48]  S. Levine,et al.  Learning Latent Plans from Play , 2019, CoRL.

[49]  Henry Zhu,et al.  Dexterous Manipulation with Deep Reinforcement Learning: Efficient, General, and Low-Cost , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[50]  Karen E Adolph,et al.  It's the journey, not the destination: Locomotor exploration in infants. , 2018, Developmental science.

[51]  Sham M. Kakade,et al.  Plan Online, Learn Offline: Efficient Learning and Exploration via Model-Based Control , 2018, ICLR.

[52]  Satoshi Shigemi,et al.  ASIMO and Humanoid Robot Research at Honda , 2018, Humanoid Robotics: A Reference.

[53]  Edward H. Adelson,et al.  3D Shape Perception from Monocular Vision, Touch, and Shape Priors , 2018, 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[54]  Russ Tedrake,et al.  Dense Object Nets: Learning Dense Visual Object Descriptors By and For Robotic Manipulation , 2018, CoRL.

[55]  Jitendra Malik,et al.  More Than a Feeling: Learning to Grasp and Regrasp Using Vision and Touch , 2018, IEEE Robotics and Automation Letters.

[56]  J. Kevin O'Regan,et al.  Fetal Origin of Sensorimotor Behavior , 2018, Front. Neurorobot..

[57]  Yin Li,et al.  Learning to Grasp Without Seeing , 2018, ISER.

[58]  Emanuel Todorov,et al.  Reinforcement learning for non-prehensile manipulation: Transfer from simulation to physical system , 2018, 2018 IEEE International Conference on Simulation, Modeling, and Programming for Autonomous Robots (SIMPAR).

[59]  Shigeki Sugano,et al.  A New Silicone Structure for uSkin—A Soft, Distributed, Digital 3-Axis Skin Sensor and Its Integration on the Humanoid Robot iCub , 2018, IEEE Robotics and Automation Letters.

[60]  Sergey Levine,et al.  Learning Complex Dexterous Manipulation with Deep Reinforcement Learning and Demonstrations , 2017, Robotics: Science and Systems.

[61]  Sergey Levine,et al.  Time-Contrastive Networks: Self-Supervised Learning from Video , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[62]  Edward H. Adelson,et al.  Improved GelSight tactile sensor for measuring geometry and slip , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[63]  Sergey Levine,et al.  Optimal control with learned local models: Application to dexterous manipulation , 2016, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[64]  Abhinav Gupta,et al.  The Curious Robot: Learning Visual Representations via Physical Interactions , 2016, ECCV.

[65]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[66]  Sergey Levine,et al.  Deep spatial autoencoders for visuomotor learning , 2015, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[67]  Yuval Tassa,et al.  Real-time behaviour synthesis for dynamic hand-manipulation , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[68]  Robert D. Howe,et al.  A compliant, underactuated hand for robust manipulation , 2013, Int. J. Robotics Res..

[69]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[70]  Zoran Popovic,et al.  Contact-invariant optimization for hand manipulation , 2012, SCA '12.

[71]  Fei-Fei Li,et al.  ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[72]  P. Allen,et al.  Dexterous Grasping via Eigengrasps : A Low-dimensional Approach to a High-complexity Problem , 2007 .

[73]  Masatoshi Ishikawa,et al.  Dynamic Pen Spinning Using a High-speed Multifingered Hand with High-speed Tactile Sensor , 2006, 2006 6th IEEE-RAS International Conference on Humanoid Robots.

[74]  R. Johansson,et al.  Somatosensory control of precision grip during unpredictable pulling loads , 2004, Experimental Brain Research.

[75]  M. Rossor,et al.  The grasp and other primitive reflexes , 2003, Journal of neurology, neurosurgery, and psychiatry.

[76]  P. Rochat Object Manipulation and Exploration in 2-to 5-Month-Old Infants , 2001 .

[77]  Allison M. Okamura,et al.  An overview of dexterous manipulation , 2000, Proceedings 2000 ICRA. Millennium Conference. IEEE International Conference on Robotics and Automation. Symposia Proceedings (Cat. No.00CH37065).

[78]  Dean Pomerleau,et al.  ALVINN, an autonomous land vehicle in a neural network , 2015 .