Dynamic Modeling of Hand-Object Interactions via Tactile Sensing

Tactile sensing is critical for humans to perform everyday tasks. While significant progress has been made in analyzing object grasping from vision, it remains unclear how we can utilize tactile sensing to reason about and model the dynamics of hand-object interactions. In this work, we employ a high-resolution tactile glove to perform four different interactive activities on a diversified set of objects. We propose a framework aiming at predicting the 3d locations of both the hand and the object purely from the touch data by combining a predictive model and a contrastive learning module. This framework can reason about the interaction patterns from the tactile data, hallucinate the changes in the environment, esti-mate the uncertainty of the prediction, and generalize to unseen objects. We also provide detailed ablation studies regarding different system designs as well as visualizations of the predicted trajectories. This work takes a step on dynamics modeling in hand-object interactions from dense tactile sensing, which opens the door for future applications in activity learning, human-computer interactions, and imitation learning for robotics.

[1]  Sergey Levine,et al.  Manipulation by Feel: Touch-Based Control with Deep Predictive Models , 2019, 2019 International Conference on Robotics and Automation (ICRA).

[2]  Deqing Mei,et al.  Recognition of surface texture with wearable tactile sensor array: A pilot Study , 2020, Sensors and Actuators A: Physical.

[3]  Aude Billard,et al.  Catching Objects in Flight , 2014, IEEE Transactions on Robotics.

[4]  Jakub W. Pachocki,et al.  Learning dexterous in-hand manipulation , 2018, Int. J. Robotics Res..

[5]  Pietro Falco,et al.  Cross-modal visuo-tactile object recognition using robotic active exploration , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[6]  Stella X. Yu,et al.  Unsupervised Feature Learning via Non-parametric Instance Discrimination , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[7]  Edward H. Adelson,et al.  Connecting Look and Feel: Associating the Visual and Tactile Properties of Physical Materials , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Feng Zhou,et al.  Embedding Label Structures for Fine-Grained Feature Representation , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Marcin Andrychowicz,et al.  Solving Rubik's Cube with a Robot Hand , 2019, ArXiv.

[10]  Antonis A. Argyros,et al.  Hand-Object Contact Force Estimation from Markerless Visual Tracking , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  Olga Sorkine-Hornung,et al.  Interactive hand pose estimation using a stretch-sensing soft glove , 2019, ACM Trans. Graph..

[12]  Edward H. Adelson,et al.  Microgeometry capture using an elastomeric sensor , 2011, ACM Trans. Graph..

[13]  Edward H. Adelson,et al.  Estimating object hardness with a GelSight touch sensor , 2016, 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[14]  R. Klatzky,et al.  Haptic perception: A tutorial , 2009, Attention, perception & psychophysics.

[15]  Antonis A. Argyros,et al.  Scalable 3D Tracking of Multiple Interacting Objects , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[16]  A. Torralba,et al.  Intelligent Carpet: Inferring 3D Human Pose from Tactile Signals , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Yoshua Bengio,et al.  Show, Attend and Tell: Neural Image Caption Generation with Visual Attention , 2015, ICML.

[18]  Danushka Bollegala,et al.  “Touching to See” and “Seeing to Feel”: Robotic Cross-modal Sensory Data Generation for Visual-Tactile Perception , 2019, 2019 International Conference on Robotics and Automation (ICRA).

[19]  Antonio Torralba,et al.  Connecting Touch and Vision via Cross-Modal Prediction , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Maria Bauza,et al.  Tactile Mapping and Localization from High-Resolution Tactile Imprints , 2019, 2019 International Conference on Robotics and Automation (ICRA).

[21]  Pejman Iravani,et al.  Bayesian tactile object recognition: Learning and recognising objects using a new inexpensive tactile sensor , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[22]  Charles C. Kemp,et al.  ContactDB: Analyzing and Predicting Grasp Contact via Thermal Imaging , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Antonio Torralba,et al.  Cross-Modal Scene Networks , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[24]  Fei-Fei Li,et al.  Deep visual-semantic alignments for generating image descriptions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Sergey Levine,et al.  Deep Dynamics Models for Learning Dexterous Manipulation , 2019, CoRL.

[26]  Feng Gao,et al.  A glove-based system for studying hand-object manipulation via joint pose and force sensing , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[27]  Daniel P. Huttenlocher,et al.  Comparing Images Using the Hausdorff Distance , 1993, IEEE Trans. Pattern Anal. Mach. Intell..

[28]  Sonia Chernova,et al.  Semi-Supervised Haptic Material Recognition for Robots using Generative Adversarial Networks , 2017, CoRL.

[29]  Katsushi Ikeuchi,et al.  A sensor fusion approach for recognizing continuous human grasping sequences using hidden Markov models , 2005, IEEE Transactions on Robotics.

[30]  Elliott Donlon,et al.  Maintaining Grasps within Slipping Bounds by Monitoring Incipient Slip , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[31]  Samy Bengio,et al.  Zero-Shot Learning by Convex Combination of Semantic Embeddings , 2013, ICLR.

[32]  Marc Pollefeys,et al.  Capturing Hands in Action Using Discriminative Salient Points and Physics Simulation , 2015, International Journal of Computer Vision.

[33]  James Philbin,et al.  FaceNet: A unified embedding for face recognition and clustering , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  Chuang Gan,et al.  Foley Music: Learning to Generate Music from Videos , 2020, ECCV.

[35]  Yann LeCun,et al.  Learning a similarity metric discriminatively, with application to face verification , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[36]  Maria Bauzá,et al.  Tactile Regrasp: Grasp Adjustments via Simulated Tactile Transformations , 2018, 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[37]  Victor Chernozhukov,et al.  Quantile regression , 2019, Journal of Econometrics.

[38]  Oriol Vinyals,et al.  Representation Learning with Contrastive Predictive Coding , 2018, ArXiv.

[39]  Ali Farhadi,et al.  Use the Force, Luke! Learning to Predict Physical Forces by Simulating Effects , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[40]  Yongjian Li,et al.  Bio-Inspired Magnetostrictive Tactile Sensor for Surface Material Recognition , 2019, IEEE Transactions on Magnetics.

[41]  Edward H. Adelson,et al.  GelSight: High-Resolution Robot Tactile Sensors for Estimating Geometry and Force , 2017, Sensors.

[42]  Antonis A. Argyros,et al.  Towards force sensing from vision: Observing hand-object interactions to infer manipulation forces , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[43]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[44]  Cordelia Schmid,et al.  Learning Joint Reconstruction of Hands and Manipulated Objects , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[45]  Charles C. Kemp,et al.  ContactPose: A Dataset of Grasps with Object Contact and Hand Pose , 2020, ECCV.

[46]  Xiaogang Wang,et al.  Deep Learning Face Representation by Joint Identification-Verification , 2014, NIPS.

[47]  E. Adelson,et al.  Cable manipulation with a tactile-reactive gripper , 2019, Robotics: Science and Systems.

[48]  Kaspar Althoefer,et al.  iCLAP: shape recognition by combining proprioception and touch sensing , 2018, Autonomous Robots.

[49]  Antonis A. Argyros,et al.  Full DOF tracking of a hand interacting with an object by modeling occlusions and physical constraints , 2011, 2011 International Conference on Computer Vision.

[50]  Yann LeCun,et al.  Dimensionality Reduction by Learning an Invariant Mapping , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[51]  KappassovZhanat,et al.  Tactile sensing in dexterous robot hands - Review , 2015 .

[52]  Mark R. Cutkosky,et al.  Force and Tactile Sensors , 2008, Springer Handbook of Robotics.

[53]  Antti Oulasvirta,et al.  Real-Time Joint Tracking of a Hand Manipulating an Object from RGB-D Input , 2016, ECCV.

[54]  E. Adelson,et al.  Retrographic sensing for the measurement of surface texture and shape , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[55]  Yongtian Wang,et al.  High-Fidelity Grasping in Virtual Reality using a Glove-based System , 2019, 2019 International Conference on Robotics and Automation (ICRA).

[56]  Chuang Gan,et al.  The Sound of Motions , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[57]  Silvio Savarese,et al.  Making Sense of Vision and Touch: Self-Supervised Learning of Multimodal Representations for Contact-Rich Tasks , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[58]  Alfonso J. García-Cerezo,et al.  CNN-Based Methods for Object Recognition With High-Resolution Tactile Sensors , 2019, IEEE Sensors Journal.

[59]  Naokazu Yokoya,et al.  Learning Joint Representations of Videos and Sentences with Web Image Search , 2016, ECCV Workshops.

[60]  Andrew Owens,et al.  Shape-independent hardness estimation using deep learning and a GelSight tactile sensor , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[61]  Aapo Hyvärinen,et al.  Noise-contrastive estimation: A new estimation principle for unnormalized statistical models , 2010, AISTATS.

[62]  Jiri Matas,et al.  Discriminative Correlation Filter Tracker with Channel and Spatial Reliability , 2016, International Journal of Computer Vision.

[63]  Jan Peters,et al.  Grip Stabilization of Novel Objects Using Slip Prediction , 2018, IEEE Transactions on Haptics.

[64]  Marc'Aurelio Ranzato,et al.  DeViSE: A Deep Visual-Semantic Embedding Model , 2013, NIPS.

[65]  Andrew Owens,et al.  Ambient Sound Provides Supervision for Visual Learning , 2016, ECCV.

[66]  A. Torralba,et al.  Learning human–environment interactions using conformal tactile textiles , 2021, Nature Electronics.

[67]  Daniel Medina,et al.  Bayesian and Neural Inference on LSTM-Based Object Recognition From Tactile and Kinesthetic Information , 2021, IEEE Robotics and Automation Letters.

[68]  Gaurav S. Sukhatme,et al.  Self-supervised regrasping using spatio-temporal tactile features and reinforcement learning , 2016, 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[69]  Wojciech Matusik,et al.  Learning the signatures of the human grasp using a scalable tactile glove , 2019, Nature.

[70]  Yashraj S. Narang,et al.  Interpreting and Predicting Tactile Signals via a Physics-Based and Data-Driven Framework , 2020, Robotics: Science and Systems.