Functional Contour-following via Haptic Perception and Reinforcement Learning

Many tasks involve the fine manipulation of objects despite limited visual feedback. In such scenarios, tactile and proprioceptive feedback can be leveraged for task completion. We present an approach for real-time haptic perception and decision-making for a haptics-driven, functional contour-following task: the closure of a ziplock bag. This task is challenging for robots because the bag is deformable, transparent, and visually occluded by artificial fingertip sensors that are also compliant. A deep neural net classifier was trained to estimate the state of a zipper within a robot's pinch grasp. A Contextual Multi-Armed Bandit (C-MAB) reinforcement learning algorithm was implemented to maximize cumulative rewards by balancing exploration versus exploitation of the state-action space. The C-MAB learner outperformed a benchmark Q-learner by more efficiently exploring the state-action space while learning a hard-to-code task. The learned C-MAB policy was tested with novel ziplock bag scenarios and contours (wire, rope). Importantly, this work contributes to the development of reinforcement learning approaches that account for limited resources such as hardware life and researcher time. As robots are used to perform complex, physically interactive tasks in unstructured or unmodeled environments, it becomes important to develop methods that enable efficient and effective learning with physical testbeds.

[1]  Eli Upfal,et al.  Multi-Armed Bandits in Metric Spaces ∗ , 2008 .

[2]  Pascal Fua,et al.  Surface Deformation Models for Nonrigid 3D Shape Recovery , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Peter Dayan,et al.  Technical Note: Q-Learning , 2004, Machine Learning.

[4]  Pierre Payeur,et al.  Adaptive Robotic Contour Following from Low Accuracy RGB-D Surface Profiling and Visual Servoing , 2014, 2014 Canadian Conference on Computer and Robot Vision.

[5]  Michael Kaess,et al.  Articulated Robot Motion for Simultaneous Localization and Mapping (ARM-SLAM) , 2016, IEEE Robotics and Automation Letters.

[6]  Vincent Hayward,et al.  Experimental Evidence of Lateral Skin Strain During Tactile Exploration , 2003 .

[7]  Pieter Abbeel,et al.  Cloth grasp point detection based on multiple-view geometric cues with application to robotic towel folding , 2010, 2010 IEEE International Conference on Robotics and Automation.

[8]  Morgan Quigley,et al.  ROS: an open-source Robot Operating System , 2009, ICRA 2009.

[9]  W. R. Thompson ON THE LIKELIHOOD THAT ONE UNKNOWN PROBABILITY EXCEEDS ANOTHER IN VIEW OF THE EVIDENCE OF TWO SAMPLES , 1933 .

[10]  Veronica J. Santos,et al.  Spatial Asymmetry in Tactile Sensor Skin Deformation Aids Perception of Edge Orientation During Haptic Exploration , 2014, IEEE Transactions on Haptics.

[11]  Belhassen-Chedli Bouzgarrou,et al.  Modeling and analysis of 3D deformable object grasping , 2014, 2014 23rd International Conference on Robotics in Alpe-Adria-Danube Region (RAAD).

[12]  A. Ng Feature selection, L1 vs. L2 regularization, and rotational invariance , 2004, Twenty-first international conference on Machine learning - ICML '04.

[13]  John Langford,et al.  The Epoch-Greedy Algorithm for Multi-armed Bandits with Side Information , 2007, NIPS.

[14]  Christian M. Ernst,et al.  Multi-armed Bandit Allocation Indices , 1989 .

[15]  Dmitry Berenson,et al.  Manipulation of deformable objects without modeling and simulating deformation , 2013, 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[16]  Michal Irani,et al.  Computing occluding and transparent motions , 1994, International Journal of Computer Vision.

[17]  Martín Abadi,et al.  TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems , 2016, ArXiv.

[18]  R. Klatzky,et al.  Hand movements: A window into haptic object recognition , 1987, Cognitive Psychology.

[19]  Aleksandrs Slivkins,et al.  Contextual Bandits with Similarity Information , 2009, COLT.

[20]  Veronica J. Santos,et al.  Haptic exploration of fingertip-sized geometric features using a multimodal tactile sensor , 2014, Sensing Technologies + Applications.

[21]  Csaba Szepesvári,et al.  –armed Bandits , 2022 .

[22]  John Langford,et al.  Efficient Optimal Learning for Contextual Bandits , 2011, UAI.

[23]  Camillo J. Taylor,et al.  Robust tactile perception of artificial tumors using pairwise comparisons of sensor array readings , 2016, 2016 IEEE Haptics Symposium (HAPTICS).

[24]  Veronica J. Santos,et al.  Biomimetic Tactile Sensor Array , 2008, Adv. Robotics.

[25]  Matthew M. Williamson,et al.  Series elastic actuators , 1995, Proceedings 1995 IEEE/RSJ International Conference on Intelligent Robots and Systems. Human Robot Interaction and Cooperative Robots.

[26]  James M. Rehg,et al.  Combining tactile sensing and vision for rapid haptic mapping , 2015, 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[27]  Timothy Bretl,et al.  Mechanics and Quasi-Static Manipulation of Planar Elastic Kinematic Chains , 2013, IEEE Transactions on Robotics.

[28]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[29]  Giorgio Metta,et al.  Active contour following to explore object shape with robot touch , 2013, 2013 World Haptics Conference (WHC).

[30]  Julie A. Shah,et al.  Towards manipulation planning for multiple interlinked deformable linear objects , 2016, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[31]  C. Karen Liu,et al.  Data-driven haptic perception for robot-assisted dressing , 2016, 2016 25th IEEE International Symposium on Robot and Human Interactive Communication (RO-MAN).

[32]  Kevin D. Glazebrook,et al.  Multi-Armed Bandit Allocation Indices: Gittins/Multi-Armed Bandit Allocation Indices , 2011 .

[33]  Mathieu Aubry,et al.  Dex-Net 1.0: A cloud-based network of 3D objects for robust grasp planning using a Multi-Armed Bandit model with correlated rewards , 2016, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[34]  Mihaela van der Schaar,et al.  Distributed Online Learning via Cooperative Contextual Bandits , 2013, IEEE Transactions on Signal Processing.

[35]  Yuval Tassa,et al.  MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[36]  Pieter Abbeel,et al.  Tracking deformable objects with point clouds , 2013, 2013 IEEE International Conference on Robotics and Automation.

[37]  Siddhartha S. Srinivasa,et al.  Robust trajectory selection for rearrangement planning as a multi-armed bandit problem , 2015, 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[38]  Danica Kragic,et al.  Multi-armed bandit models for 2D grasp planning with uncertainty , 2015, 2015 IEEE International Conference on Automation Science and Engineering (CASE).

[39]  Stefan Schaal,et al.  Skill learning and task outcome prediction for manipulation , 2011, 2011 IEEE International Conference on Robotics and Automation.

[40]  Alexander Koenig,et al.  Multisensor Contour Following With Vision, Force, and Acceleration Sensors for an Industrial Robot , 2013, IEEE Transactions on Instrumentation and Measurement.

[41]  Chris Watkins,et al.  Learning from delayed rewards , 1989 .

[42]  Ning Chen,et al.  Edge tracking using tactile servo , 1995, Proceedings 1995 IEEE/RSJ International Conference on Intelligent Robots and Systems. Human Robot Interaction and Cooperative Robots.