Teaching a Robot Tasks of Arbitrary Complexity via Human Feedback

This paper addresses the problem of training a robot to carry out temporal tasks of arbitrary complexity via evaluative human feedback that can be inaccurate. A key idea explored in our work is a kind of curriculum learning---training the robot to master simple tasks and then building up to more complex tasks. We show how a training procedure, using knowledge of the formal task representation, can decompose and train any task efficiently in the size of its representation. We further provide a set of experiments that support the claim that non-expert human trainers can decompose tasks in a way that is consistent with our theoretical results, with more than half of participants successfully training all of our experimental missions. We compared our algorithm with existing approaches and our experimental results suggest that our method outperforms alternatives, especially when feedback contains mistakes.

[1]  David L. Roberts,et al.  A Strategy-Aware Technique for Learning Behaviors from Discrete Human Feedback , 2014, AAAI.

[2]  Fiery Cushman,et al.  Teaching with Rewards and Punishments: Reinforcement or Communication? , 2015, CogSci.

[3]  Andrew Y. Ng,et al.  Pharmacokinetics of a novel formulation of ivermectin after administration to goats , 2000, ICML.

[4]  Shen Li,et al.  Bayesian Inference of Temporal Task Specifications from Demonstrations , 2018, NeurIPS.

[5]  Ufuk Topcu,et al.  Environment-Independent Task Specifications via GLTL , 2017, ArXiv.

[6]  Pieter Abbeel,et al.  Apprenticeship learning via inverse reinforcement learning , 2004, ICML.

[7]  Nan Jiang,et al.  Repeated Inverse Reinforcement Learning , 2017, NIPS.

[8]  Alberto Camacho,et al.  Learning Interpretable Models Expressed in Linear Temporal Logic , 2019, ICAPS.

[9]  Anind K. Dey,et al.  Maximum Entropy Inverse Reinforcement Learning , 2008, AAAI.

[10]  Angelo Ferrando,et al.  Comparing Trace Expressions and Linear Temporal Logic for Runtime Verification , 2016, Theory and Practice of Formal Methods.

[11]  Michael L. Littman,et al.  Reinforcement learning improves behaviour from evaluative feedback , 2015, Nature.

[12]  Peter Stone,et al.  Learning non-myopically from human-generated reward , 2013, IUI '13.

[13]  Hadas Kress-Gazit,et al.  Temporal-Logic-Based Reactive Mission and Motion Planning , 2009, IEEE Transactions on Robotics.

[14]  Richard L. Lewis,et al.  Where Do Rewards Come From , 2009 .

[15]  Michael L. Littman,et al.  Apprenticeship Learning About Multiple Intentions , 2011, ICML.

[16]  Michèle Sebag,et al.  Preference-Based Policy Learning , 2011, ECML/PKDD.

[17]  Peter Stone,et al.  Interactively shaping agents via human reinforcement: the TAMER framework , 2009, K-CAP '09.

[18]  Matthias Scheutz,et al.  Interpretable apprenticeship learning with temporal logic specifications , 2017, 2017 IEEE 56th Annual Conference on Decision and Control (CDC).

[19]  Anca D. Dragan,et al.  Cooperative Inverse Reinforcement Learning , 2016, NIPS.

[20]  Mark K. Ho,et al.  Social is special: A normative framework for teaching with and learning from evaluative feedback , 2017, Cognition.

[21]  Shane Legg,et al.  Deep Reinforcement Learning from Human Preferences , 2017, NIPS.

[22]  Peter Stone,et al.  Cobot in LambdaMOO: An Adaptive Social Statistics Agent , 2006, Autonomous Agents and Multi-Agent Systems.

[23]  Andrea Lockerd Thomaz,et al.  Teachable robots: Understanding human teaching behavior to build more effective robot learners , 2008, Artif. Intell..

[24]  Matthew E. Taylor,et al.  Curriculum Design for Machine Learners in Sequential Decision Tasks , 2017, IEEE Transactions on Emerging Topics in Computational Intelligence.

[25]  R.L. Rivest,et al.  A Formal Model of Hierarchical Concept Learning , 1994, Inf. Comput..

[26]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[27]  Reid G. Simmons,et al.  Complexity Analysis of Real-Time Reinforcement Learning , 1993, AAAI.

[28]  Javier Ruiz-del-Solar,et al.  A fast hybrid reinforcement learning framework with human corrective feedback , 2018, Auton. Robots.

[29]  Johannes Fürnkranz,et al.  Preference-Based Reinforcement Learning: A Preliminary Survey , 2013 .

[30]  Guan Wang,et al.  Interactive Learning from Policy-Dependent Human Feedback , 2017, ICML.

[31]  Daniel Neider,et al.  Learning Linear Temporal Properties , 2018, 2018 Formal Methods in Computer Aided Design (FMCAD).

[32]  John R. Koza,et al.  Genetic programming - on the programming of computers by means of natural selection , 1993, Complex adaptive systems.

[33]  Craig Boutilier,et al.  Rewarding Behaviors , 1996, AAAI/IAAI, Vol. 2.