Human-robot skills transfer interfaces for a flexible surgical robot

In minimally invasive surgery, tools go through narrow openings and manipulate soft organs to perform surgical tasks. There are limitations in current robot-assisted surgical systems due to the rigidity of robot tools. The aim of the STIFF-FLOP European project is to develop a soft robotic arm to perform surgical tasks. The flexibility of the robot allows the surgeon to move within organs to reach remote areas inside the body and perform challenging procedures in laparoscopy. This article addresses the problem of designing learning interfaces enabling the transfer of skills from human demonstration. Robot programming by demonstration encompasses a wide range of learning strategies, from simple mimicking of the demonstrator's actions to the higher level imitation of the underlying intent extracted from the demonstrations. By focusing on this last form, we study the problem of extracting an objective function explaining the demonstrations from an over-specified set of candidate reward functions, and using this information for self-refinement of the skill. In contrast to inverse reinforcement learning strategies that attempt to explain the observations with reward functions defined for the entire task (or a set of pre-defined reward profiles active for different parts of the task), the proposed approach is based on context-dependent reward-weighted learning, where the robot can learn the relevance of candidate objective functions with respect to the current phase of the task or encountered situation. The robot then exploits this information for skills refinement in the policy parameters space. The proposed approach is tested in simulation with a cutting task performed by the STIFF-FLOP flexible robot, using kinesthetic demonstrations from a Barrett WAM manipulator.

[1]  Jean-Paul Laumond,et al.  An Inverse Optimal Control Approach to Human Motion Modeling , 2009, ISRR.

[2]  Rajesh P. N. Rao,et al.  Dynamic Imitation in a Humanoid Robot through Nonparametric Probabilistic Inference , 2006, Robotics: Science and Systems.

[3]  Siddhartha S. Srinivasa,et al.  Inverse Optimal Heuristic Control for Imitation Learning , 2009, AISTATS.

[4]  Aude Billard,et al.  On Learning, Representing, and Generalizing a Task in a Humanoid Robot , 2007, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[5]  Darwin G. Caldwell,et al.  Multi-optima exploration with adaptive Gaussian mixture model , 2012, 2012 IEEE International Conference on Development and Learning and Epigenetic Robotics (ICDL).

[6]  Jan Peters,et al.  Imitation and Reinforcement Learning , 2010, IEEE Robotics & Automation Magazine.

[7]  Andrew G. Barto,et al.  An Adaptive Robot Motivational System , 2006, SAB.

[8]  Tom Schaul,et al.  Exploring parameter space in reinforcement learning , 2010, Paladyn J. Behav. Robotics.

[9]  Michael I. Jordan,et al.  On Convergence Properties of the EM Algorithm for Gaussian Mixtures , 1996, Neural Computation.

[10]  J. Wickens,et al.  Computational models of the basal ganglia: from robots to membranes , 2004, Trends in Neurosciences.

[11]  New York Dover,et al.  ON THE CONVERGENCE PROPERTIES OF THE EM ALGORITHM , 1983 .

[12]  Gregory D. Hager,et al.  Motion generation of robotic surgical tasks: Learning from expert demonstrations , 2010, 2010 Annual International Conference of the IEEE Engineering in Medicine and Biology.

[13]  Guang-Zhong Yang,et al.  Cooperative in situ microscopic scanning and simultaneous tissue surface reconstruction using a compliant robotic manipulator , 2013, 2013 IEEE International Conference on Robotics and Automation.

[14]  Dirk P. Kroese,et al.  The Cross Entropy Method: A Unified Approach To Combinatorial Optimization, Monte-carlo Simulation (Information Science and Statistics) , 2004 .

[15]  Stefan Schaal,et al.  A Generalized Path Integral Control Approach to Reinforcement Learning , 2010, J. Mach. Learn. Res..

[16]  W. Wong,et al.  On ψ-Learning , 2003 .

[17]  Darwin G. Caldwell,et al.  Bayesian Nonparametric Multi-Optima Policy Search in Reinforcement Learning , 2013, AAAI.

[18]  Dirk P. Kroese,et al.  The Cross-Entropy Method: A Unified Approach to Combinatorial Optimization, Monte-Carlo Simulation and Machine Learning , 2004 .

[19]  Lih-Yuan Deng,et al.  The Cross-Entropy Method: A Unified Approach to Combinatorial Optimization, Monte-Carlo Simulation, and Machine Learning , 2006, Technometrics.

[20]  Kaspar Althoefer,et al.  Design of a variable stiffness flexible manipulator with composite granular jamming and membrane coupling , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[21]  Nikolaus Hansen,et al.  The CMA Evolution Strategy: A Comparing Review , 2006, Towards a New Evolutionary Computation.

[22]  Sethu Vijayakumar,et al.  Transferring Human Impedance Behavior to Heterogeneous Variable Impedance Actuators , 2013, IEEE Transactions on Robotics.

[23]  Christopher G. Atkeson,et al.  Constructive Incremental Learning from Only Local Information , 1998, Neural Computation.

[24]  Stefan Schaal,et al.  Learning objective functions for manipulation , 2013, 2013 IEEE International Conference on Robotics and Automation.

[25]  Darwin G. Caldwell,et al.  Compliant skills acquisition and multi-optima policy search with EM-based reinforcement learning , 2013, Robotics Auton. Syst..

[26]  Evan Dekker,et al.  Empirical evaluation methods for multiobjective reinforcement learning algorithms , 2011, Machine Learning.

[27]  Geoffrey J. McLachlan,et al.  Finite Mixture Models , 2019, Annual Review of Statistics and Its Application.

[28]  J. Peters,et al.  Using Reward-weighted Regression for Reinforcement Learning of Task Space Control , 2007, 2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning.

[29]  Olivier Sigaud,et al.  Path Integral Policy Improvement with Covariance Matrix Adaptation , 2012, ICML.

[30]  Jan Peters,et al.  Local Gaussian process regression for real-time model-based robot control , 2008, 2008 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[31]  Darwin G. Caldwell,et al.  Skills transfer across dissimilar robots by learning context-dependent rewards , 2013, 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[32]  Michael I. Jordan,et al.  Supervised learning from incomplete data via an EM approach , 1993, NIPS.

[33]  Stephane Cotin,et al.  EP4A: Software and Computer Based Simulator Research: Development and Outlook SOFA—An Open Source Framework for Medical Simulation , 2007, MMVR.

[34]  Srini Narayanan,et al.  Learning all optimal policies with multiple criteria , 2008, ICML '08.

[35]  Geoffrey E. Hinton,et al.  Using Expectation-Maximization for Reinforcement Learning , 1997, Neural Computation.

[36]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[37]  R. Fildes Journal of the Royal Statistical Society (B): Gary K. Grunwald, Adrian E. Raftery and Peter Guttorp, 1993, “Time series of continuous proportions”, 55, 103–116.☆ , 1993 .

[38]  Stefan Schaal,et al.  Incremental Online Learning in High Dimensions , 2005, Neural Computation.

[39]  Pieter Abbeel,et al.  Apprenticeship learning via inverse reinforcement learning , 2004, ICML.

[40]  Brett Browning,et al.  A survey of robot learning from demonstration , 2009, Robotics Auton. Syst..

[41]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[42]  Jan Peters,et al.  Imitation and Reinforcement Learning: Practical Algorithms for Motor Primitives in Robotics , 2010 .

[43]  T. Nanayakkara,et al.  A VARIABLE STIFFNESS JOINT BY GRANULAR JAMMING , 2012 .

[44]  Fernando De la Torre,et al.  Canonical locality preserving Latent Variable Model for discriminative pose inference , 2013, Image Vis. Comput..

[45]  Stefan Schaal,et al.  Robot Programming by Demonstration , 2009, Springer Handbook of Robotics.

[46]  Manuel Lopes,et al.  Active Learning for Reward Estimation in Inverse Reinforcement Learning , 2009, ECML/PKDD.

[47]  Arianna Menciassi,et al.  STIFF-FLOP surgical manipulator: Mechanical design and experimental characterization of the single module , 2013, 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[48]  K. Dautenhahn,et al.  Imitation and Social Learning in Robots, Humans and Animals: Behavioural, Social and Communicative Dimensions , 2009 .

[49]  J Kellogg Parsons,et al.  The first national examination of outcomes and trends in robotic surgery in the United States. , 2012, Journal of the American College of Surgeons.

[50]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[51]  Nathan F. Lepora,et al.  Robotics: Science and Systems (RSS) , 2013 .