Interpretability in Contact-Rich Manipulation via Kinodynamic Images

Deep Neural Networks (NNs) have been widely utilized in contact-rich manipulation tasks to model the complicated contact dynamics. However, NN-based models are often difficult to decipher which can lead to seemingly inexplicable behaviors and unidentifiable failure cases. In this work, we address the interpretability of NN-based models by introducing the kinodynamic images. We propose a methodology that creates images from kinematic and dynamic data of contact-rich manipulation tasks. By using images as the state representation, we enable the application of interpretability modules that were previously limited to vision-based tasks. We use this representation to train a Convolutional Neural Network (CNN) and we extract interpretations with Grad-CAM to produce visual explanations. Our method is versatile and can be applied to any classification problem in manipulation tasks to visually interpret which parts of the input drive the model’s decisions and distinguish its failure modes, regardless of the features used. Our experiments demonstrate that our method enables detailed visual inspections of sequences in a task, and high-level evaluations of a model’s behavior. Code for this work is available at [1].

[1]  David E. Smith,et al.  Designing Environments Conducive to Interpretable Robot Behavior , 2020, 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[2]  David Hsu,et al.  Push-Net: Deep Planar Pushing for Objects with Unknown Physical Properties , 2018, Robotics: Science and Systems.

[3]  Jakub W. Pachocki,et al.  Learning dexterous in-hand manipulation , 2018, Int. J. Robotics Res..

[4]  Pascal Vincent,et al.  Visualizing Higher-Layer Features of a Deep Network , 2009 .

[5]  A. Aldo Faisal,et al.  Dot-to-Dot: Explainable Hierarchical Reinforcement Learning for Robotic Manipulation , 2019, 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[6]  Bradley Hayes,et al.  Interpretable models for fast activity recognition and anomaly explanation during collaborative robotics tasks , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[7]  Sergey Levine,et al.  Manipulation by Feel: Touch-Based Control with Deep Predictive Models , 2019, 2019 International Conference on Robotics and Automation (ICRA).

[8]  Danica Kragic,et al.  Modelling and Learning Dynamics for Robotic Food-Cutting , 2020, 2021 IEEE 17th International Conference on Automation Science and Engineering (CASE).

[9]  Ali Farhadi,et al.  Use the Force, Luke! Learning to Predict Physical Forces by Simulating Effects , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Danica Kragic,et al.  Data-Driven Grasp Synthesis—A Survey , 2013, IEEE Transactions on Robotics.

[11]  Zoe Doulgeri,et al.  Slippage Detection Generalizing to Grasping of Unknown Objects Using Machine Learning With Novel Features , 2018, IEEE Robotics and Automation Letters.

[12]  Martin Wattenberg,et al.  Interpretability Beyond Feature Attribution: Quantitative Testing with Concept Activation Vectors (TCAV) , 2017, ICML.

[13]  Raia Hadsell,et al.  From Pixels to Percepts: Highly Robust Edge Perception and Contour Following Using Deep Learning and an Optical Biomimetic Tactile Sensor , 2018, IEEE Robotics and Automation Letters.

[14]  Byron Boots,et al.  Robust Learning of Tactile Force Estimation through Robot Interaction , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[15]  Zhe L. Lin,et al.  Top-Down Neural Attention by Excitation Backprop , 2016, International Journal of Computer Vision.

[16]  Andrew Zisserman,et al.  Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps , 2013, ICLR.

[17]  C. Karen Liu,et al.  Deep Haptic Model Predictive Control for Robot-Assisted Dressing , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[18]  Dieter Fox,et al.  Prospection: Interpretable plans from language by predicting the future , 2019, 2019 International Conference on Robotics and Automation (ICRA).

[19]  Rob Fergus,et al.  Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[20]  Been Kim,et al.  Sanity Checks for Saliency Maps , 2018, NeurIPS.

[21]  Maria Bauza,et al.  A probabilistic data-driven model for planar pushing , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[22]  Abhishek Das,et al.  Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[23]  Wojciech Samek,et al.  Methods for interpreting and understanding deep neural networks , 2017, Digit. Signal Process..

[24]  Sergey Levine,et al.  Deep Dynamics Models for Learning Dexterous Manipulation , 2019, CoRL.

[25]  John Kenneth Salisbury,et al.  Learning to represent haptic feedback for partially-observable tasks , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[26]  Alex Lascarides,et al.  Disentangled Relational Representations for Explaining and Learning from Demonstration , 2019, CoRL.

[27]  Radu Grosu,et al.  Designing Worm-inspired Neural Networks for Interpretable Robotic Control , 2019, 2019 International Conference on Robotics and Automation (ICRA).

[28]  Silvio Savarese,et al.  Making Sense of Vision and Touch: Self-Supervised Learning of Multimodal Representations for Contact-Rich Tasks , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[29]  Tatsuhiko Tsunoda,et al.  DeepInsight: A methodology to transform a non-image data to an image for convolution neural network architecture , 2019, Scientific Reports.