Preference learning on the execution of collaborative human-robot tasks

We present a novel method to learn human preferences during, and for, the execution of concurrent joint humanrobot tasks. We consider tasks realized by a team of a human operator and a robot helper that should adapt to the human's task execution preferences. Different human operators can have different abilities, experiences, and personal preferences, so that a particular allocation of activities in the team is preferred over another. We cast the behavior of concurrent multi-agent cooperation as a semi Markov Decision Process and show how to model and learn human preferences over the team behavior. After proposing two different interactive learning algorithms, we evaluate them and show that the system can effectively learn and adapt to human preferences.

[1]  Manuela M. Veloso,et al.  Interactive Policy Learning through Confidence-Based Autonomy , 2014, J. Artif. Intell. Res..

[2]  Thorsten Joachims,et al.  Online Structured Prediction via Coactive Learning , 2012, ICML.

[3]  Luc De Raedt,et al.  Bellman goes relational , 2004, ICML.

[4]  Hema Swetha Koppula,et al.  Anticipatory Planning for Human-Robot Teams , 2014, ISER.

[5]  Stefanos Nikolaidis,et al.  Human-robot cross-training: Computational formulation, modeling and evaluation of a human team training strategy , 2013, 2013 8th ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[6]  Cynthia Breazeal,et al.  Training a Robot via Human Feedback: A Case Study , 2013, ICSR.

[7]  Takayuki Kanda,et al.  Adapting Robot Behavior for Human--Robot Interaction , 2008, IEEE Transactions on Robotics.

[8]  Stefanos Nikolaidis,et al.  Efficient Model Learning for Human-Robot Collaborative Tasks , 2014, ArXiv.

[9]  Manuel C. Lopes,et al.  Robot self-initiative and personalization by learning through repeated interactions , 2011, 2011 6th ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[10]  Manuel Lopes,et al.  Active Learning for Reward Estimation in Inverse Reinforcement Learning , 2009, ECML/PKDD.

[11]  Manuel Lopes,et al.  Inverse Reinforcement Learning in Relational Domains , 2015, IJCAI.

[12]  De,et al.  Relational Reinforcement Learning , 2022 .

[13]  Marc Toussaint,et al.  Planning with Noisy Probabilistic Relational Rules , 2010, J. Artif. Intell. Res..

[14]  Jodi Forlizzi,et al.  Personalization in HRI: A longitudinal field experiment , 2012, 2012 7th ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[15]  Manuel Lopes,et al.  Relational activity processes for modeling concurrent cooperation , 2016, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[16]  Michèle Sebag,et al.  Preference-Based Policy Learning , 2011, ECML/PKDD.

[17]  Thorsten Joachims,et al.  Learning Trajectory Preferences for Manipulators via Iterative Improvement , 2013, NIPS.

[18]  Hendrik Blockeel,et al.  Top-Down Induction of First Order Logical Decision Trees , 1998, AI Commun..

[19]  Kristian Kersting,et al.  Imitation Learning in Relational Domains: A Functional-Gradient Boosting Approach , 2011, IJCAI.

[20]  Tadej Petric,et al.  Teaching robots to cooperate with humans in dynamic manipulation tasks based on multi-modal human-in-the-loop approach , 2014, Auton. Robots.

[21]  Daniel H. Grollman,et al.  Dogged Learning for Robots , 2007, Proceedings 2007 IEEE International Conference on Robotics and Automation.