Learning to Identify Object Instances by Touch: Tactile Recognition via Multimodal Matching

Much of the literature on robotic perception focuses on the visual modality. Vision provides a global observation of a scene, making it broadly useful. However, in the domain of robotic manipulation, vision alone can sometimes prove inadequate: in the presence of occlusions or poor lighting, visual object identification might be difficult. The sense of touch can provide robots with an alternative mechanism for recognizing objects. In this paper, we study the problem of touch-based instance recognition. We propose a novel framing of the problem as multi-modal recognition: the goal of our system is to recognize, given a visual and tactile observation, whether or not these observations correspond to the same object. To our knowledge, our work is the first to address this type of multi-modal instance recognition problem on such a large-scale with our analysis spanning 98 different objects. We employ a robot equipped with two GelSight touch sensors, one on each finger, and a self-supervised, autonomous data collection procedure to collect a dataset of tactile observations and images. Our experimental results show that it is possible to accurately recognize object instances by touch alone, including instances of novel objects that were never seen during training. Our learned model outperforms other methods on this complex task, including that of human volunteers.

[1]  John Shawe-Taylor,et al.  Canonical Correlation Analysis: An Overview with Application to Learning Methods , 2004, Neural Computation.

[2]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[3]  Edward H. Adelson,et al.  Shape estimation in natural illumination , 2011, CVPR 2011.

[4]  Christoph H. Lampert,et al.  Learning Dynamic Tactile Sensing With Robust Vision-Based Training , 2011, IEEE Transactions on Robotics.

[5]  Pawan Sinha,et al.  Corrigendum: The newly sighted fail to match seen with felt , 2011, Nature Neuroscience.

[6]  Edward H. Adelson,et al.  Sensing and Recognizing Surface Textures Using a GelSight Sensor , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[7]  Ruslan Salakhutdinov,et al.  Multimodal Neural Language Models , 2014, ICML.

[8]  Olivier Sigaud,et al.  Deep unsupervised network for multimodal perception, representation and classification , 2015, Robotics Auton. Syst..

[9]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[10]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Judith Hoffman,et al.  Adaptive Learning Algorithms for Transferable Visual Recognition , 2016 .

[12]  Edward H. Adelson,et al.  Connecting Look and Feel: Associating the Visual and Tactile Properties of Physical Materials , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Edward H. Adelson,et al.  GelSight: High-Resolution Robot Tactile Sensors for Estimating Geometry and Force , 2017, Sensors.

[14]  Edward H. Adelson,et al.  Improved GelSight tactile sensor for measuring geometry and slip , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[15]  Andrew Zisserman,et al.  Look, Listen and Learn , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[16]  Andrew Owens,et al.  The Feeling of Success: Does Touch Sensing Help Predict Grasp Outcomes? , 2017, CoRL.

[17]  Andrew Owens,et al.  Shape-independent hardness estimation using deep learning and a GelSight tactile sensor , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[18]  Andrew Zisserman,et al.  Seeing Voices and Hearing Faces: Cross-Modal Biometric Matching , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[19]  Yin Li,et al.  Learning to Grasp Without Seeing , 2018, ISER.

[20]  Jitendra Malik,et al.  More Than a Feeling: Learning to Grasp and Regrasp Using Vision and Touch , 2018, IEEE Robotics and Automation Letters.

[21]  Anthony G. Cohn,et al.  ViTac: Feature Sharing Between Vision and Tactile Sensing for Cloth Texture Recognition , 2018, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[22]  Andrew Zisserman,et al.  Objects that Sound , 2017, ECCV.

[23]  Maria Bauzá,et al.  Tactile Regrasp: Grasp Adjustments via Simulated Tactile Transformations , 2018, 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).