Efficient Model Learning for Human-Robot Collaborative Tasks

We present a framework for learning human user models from joint-action demonstrations that enables the robot to compute a robust policy for a collaborative task with a human. The learning takes place completely automatically, without any human intervention. First, we describe the clustering of demonstrated action sequences into different human types using an unsupervised learning algorithm. These demonstrated sequences are also used by the robot to learn a reward function that is representative for each type, through the employment of an inverse reinforcement learning algorithm. The learned model is then used as part of a Mixed Observability Markov Decision Process formulation, wherein the human type is a partially observable variable. With this framework, we can infer, either offline or online, the human type of a new user that was not included in the training set, and can compute a policy for the robot that will be aligned to the preference of this new user and will be robust to deviations of the human actions from prior demonstrations. Finally we validate the approach using data collected in human subject experiments, and conduct proof-of-concept demonstrations in which a person performs a collaborative task with a small industrial robot.

[1]  Joel Veness,et al.  Monte-Carlo Planning in Large POMDPs , 2010, NIPS.

[2]  J. Marden Analyzing and Modeling Rank Data , 1996 .

[3]  Leslie Pack Kaelbling,et al.  CAPIR: Collaborative Action Planning with Intention Recognition , 2011, AIIDE.

[4]  Brett Browning,et al.  A survey of robot learning from demonstration , 2009, Robotics Auton. Syst..

[5]  Leslie Pack Kaelbling,et al.  Planning and Acting in Partially Observable Stochastic Domains , 1998, Artif. Intell..

[6]  E. Yaz Linear Matrix Inequalities In System And Control Theory , 1998, Proceedings of the IEEE.

[7]  Thomas Brendan Murphy,et al.  Mixtures of distance-based models for ranking data , 2003, Comput. Stat. Data Anal..

[8]  Illah R. Nourbakhsh,et al.  Designing POMDP models of socially situated tasks , 2011, 2011 RO-MAN.

[9]  Leslie Pack Kaelbling,et al.  POMCoP: Belief Space Planning for Sidekicks in Cooperative Games , 2012, AIIDE.

[10]  Monica N. Nicolescu,et al.  Natural methods for robot task learning: instructive demonstrations, generalization and practice , 2003, AAMAS '03.

[11]  Joelle Pineau,et al.  Maximum Mean Discrepancy Imitation Learning , 2013, Robotics: Science and Systems.

[12]  Nicholas Roy,et al.  Efficient model learning for dialog management , 2007, 2007 2nd ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[13]  Pieter Abbeel,et al.  Apprenticeship learning via inverse reinforcement learning , 2004, ICML.

[14]  David Hsu,et al.  Planning under Uncertainty for Robotic Tasks with Mixed Observability , 2010, Int. J. Robotics Res..

[15]  Joelle Pineau,et al.  Point-based value iteration: An anytime algorithm for POMDPs , 2003, IJCAI.

[16]  Joelle Pineau,et al.  Mixed Observability Predictive State Representations , 2013, AAAI.

[17]  David Hsu,et al.  SARSOP: Efficient Point-Based POMDP Planning by Approximating Optimally Reachable Belief Spaces , 2008, Robotics: Science and Systems.

[18]  Emilio Frazzoli,et al.  Intention-Aware Motion Planning , 2013, WAFR.

[19]  Stefan Schaal,et al.  Robot Learning From Demonstration , 1997, ICML.

[20]  Sarit Kraus,et al.  Empirical evaluation of ad hoc teamwork in the pursuit domain , 2011, AAMAS.

[21]  Manuela M. Veloso,et al.  Teaching multi-robot coordination using demonstration of communication and state sharing , 2008, AAMAS.

[22]  Stefanos Nikolaidis,et al.  Human-robot cross-training: Computational formulation, modeling and evaluation of a human team training strategy , 2013, 2013 8th ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[23]  Guy Shani,et al.  Noname manuscript No. (will be inserted by the editor) A Survey of Point-Based POMDP Solvers , 2022 .

[24]  Maya Cakmak,et al.  Trajectories and keyframes for kinesthetic teaching: A human-robot interaction perspective , 2012, 2012 7th ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[25]  Robert E. Schapire,et al.  A Game-Theoretic Approach to Apprenticeship Learning , 2007, NIPS.

[26]  Kevin Waugh,et al.  Computational Rationalization: The Inverse Equilibrium Problem , 2011, ICML.

[27]  Jukka Corander,et al.  Bayesian clustering of DNA sequences using Markov chains and a stochastic partition model , 2014, Statistical applications in genetics and molecular biology.

[28]  Alan Fern,et al.  A Computational Decision Theory for Interactive Assistants , 2010, Interactive Decision Theory and Game Theory.

[29]  Bernhard Schölkopf,et al.  Probabilistic Modeling of Human Movements for Intention Inference , 2012, Robotics: Science and Systems.