Efficient behavior learning in human–robot collaboration

We present a novel method for a robot to interactively learn, while executing, a joint human–robot task. We consider collaborative tasks realized by a team of a human operator and a robot helper that adapts to the human’s task execution preferences. Different human operators can have different abilities, experiences, and personal preferences so that a particular allocation of activities in the team is preferred over another. Our main goal is to have the robot learn the task and the preferences of the user to provide a more efficient and acceptable joint task execution. We cast concurrent multi-agent collaboration as a semi-Markov decision process and show how to model the team behavior and learn the expected robot behavior. We further propose an interactive learning framework and we evaluate it both in simulation and on a real robotic setup to show the system can effectively learn and adapt to human expectations.

[1]  Takayuki Kanda,et al.  Adapting Robot Behavior for Human--Robot Interaction , 2008, IEEE Transactions on Robotics.

[2]  Manuel Lopes,et al.  Inverse Reinforcement Learning in Relational Domains , 2015, IJCAI.

[3]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[4]  Kristian Kersting,et al.  Imitation Learning in Relational Domains: A Functional-Gradient Boosting Approach , 2011, IJCAI.

[5]  Manuel Lopes,et al.  Relational activity processes for modeling concurrent cooperation , 2016, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[6]  Hendrik Blockeel,et al.  Top-Down Induction of First Order Logical Decision Trees , 1998, AI Commun..

[7]  Victor W. Marek,et al.  Stable models and an alternative logic programming paradigm , 1998, The Logic Programming Paradigm.

[8]  Michèle Sebag,et al.  Preference-Based Policy Learning , 2011, ECML/PKDD.

[9]  Kurt Driessens,et al.  Relational Reinforcement Learning , 1998, Machine-mediated learning.

[10]  Manuel C. Lopes,et al.  Robot self-initiative and personalization by learning through repeated interactions , 2011, 2011 6th ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[11]  Stefanos Nikolaidis,et al.  Human-robot cross-training: Computational formulation, modeling and evaluation of a human team training strategy , 2013, 2013 8th ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[12]  Cynthia Breazeal,et al.  Training a Robot via Human Feedback: A Case Study , 2013, ICSR.

[13]  Thorsten Joachims,et al.  Learning Trajectory Preferences for Manipulators via Iterative Improvement , 2013, NIPS.

[14]  Manuel Lopes,et al.  Active Learning for Reward Estimation in Inverse Reinforcement Learning , 2009, ECML/PKDD.

[15]  Victor W. Marek,et al.  The Logic Programming Paradigm: A 25-Year Perspective , 2011 .

[16]  Luc De Raedt,et al.  Bellman goes relational , 2004, ICML.

[17]  Hema Swetha Koppula,et al.  Anticipatory Planning for Human-Robot Teams , 2014, ISER.

[18]  Stefanos Nikolaidis,et al.  Efficient Model Learning for Human-Robot Collaborative Tasks , 2014, ArXiv.

[19]  Thorsten Joachims,et al.  Online Structured Prediction via Coactive Learning , 2012, ICML.

[20]  Manuela M. Veloso,et al.  Interactive Policy Learning through Confidence-Based Autonomy , 2014, J. Artif. Intell. Res..

[21]  Daniel H. Grollman,et al.  Dogged Learning for Robots , 2007, Proceedings 2007 IEEE International Conference on Robotics and Automation.

[22]  Sridhar Mahadevan,et al.  Coarticulation: an approach for generating concurrent plans in Markov decision processes , 2005, ICML.

[23]  Jodi Forlizzi,et al.  Personalization in HRI: A longitudinal field experiment , 2012, 2012 7th ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[24]  Marc Toussaint,et al.  Planning with Noisy Probabilistic Relational Rules , 2010, J. Artif. Intell. Res..