Assisted Inverse Reinforcement Learning

We study the problem of inverse reinforcement learning (IRL) with the added twist that the learner is assisted by a helpful teacher. More formally, we tackle the following question: How could a teacher provide an informative sequence of demonstrations to an IRL agent to speed up the learning process? We prove rigorous convergence guarantees of a new iterative teaching algorithm that adaptively chooses demonstrations based on the learner’s current performance. Extensive experiments with a car driving simulator environment show that the learning progress can be speeded up drastically as compared to an uninformative teacher.

[1]  Andrew Y. Ng,et al.  Pharmacokinetics of a novel formulation of ivermectin after administration to goats , 2000, ICML.

[2]  Stefan Schaal,et al.  Robot Programming by Demonstration , 2009, Springer Handbook of Robotics.

[3]  Pieter Abbeel,et al.  Apprenticeship learning via inverse reinforcement learning , 2004, ICML.

[4]  Thomas L. Griffiths,et al.  Faster Teaching via POMDP Planning , 2016, Cogn. Sci..

[5]  Stefano Ermon,et al.  Generative Adversarial Imitation Learning , 2016, NIPS.

[6]  A. Gopnik,et al.  Children’s imitation of causal action sequences is influenced by statistical and pedagogical evidence , 2011, Cognition.

[7]  Xi Chen,et al.  Learning From Demonstration in the Wild , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[8]  Anca D. Dragan,et al.  Cooperative Inverse Reinforcement Learning , 2016, NIPS.

[9]  Sebastian Tschiatschek,et al.  Teaching Inverse Reinforcement Learners via Features and Demonstrations , 2018, NeurIPS.

[10]  Richard L. Lewis,et al.  Reward Design via Online Gradient Ascent , 2010, NIPS.

[11]  Amin Karbasi,et al.  On Actively Teaching the Crowd to Classify , 2013, NIPS 2013.

[12]  Fiery Cushman,et al.  Showing versus doing: Teaching by demonstration , 2016, NIPS.

[13]  Nan Jiang,et al.  Repeated Inverse Reinforcement Learning , 2017, NIPS.

[14]  J. Andrew Bagnell,et al.  Modeling Purposeful Adaptive Behavior with the Principle of Maximum Causal Entropy , 2010 .

[15]  Maya Cakmak,et al.  Eliciting good teaching from humans for machine learners , 2014, Artif. Intell..

[16]  Pieter Abbeel,et al.  An Algorithmic Perspective on Imitation Learning , 2018, Found. Trends Robotics.

[17]  J. Andrew Bagnell,et al.  Maximum margin planning , 2006, ICML.

[18]  Le Song,et al.  Iterative Machine Teaching , 2017, ICML.

[19]  Andrea Lockerd Thomaz,et al.  Robot Learning from Human Teachers , 2014, Robot Learning from Human Teachers.

[20]  Scott Niekum,et al.  Machine Teaching for Inverse Reinforcement Learning: Algorithms and Applications , 2018, AAAI.

[21]  Andrew Y. Ng,et al.  Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping , 1999, ICML.

[22]  Bradley C. Love,et al.  Optimal Teaching for Limited-Capacity Human Learners , 2014, NIPS.

[23]  Noah D. Goodman,et al.  A rational account of pedagogical reasoning: Teaching by, and learning from, examples , 2014, Cognitive Psychology.

[24]  Manuel Lopes,et al.  Algorithmic and Human Teaching of Sequential Decision Tasks , 2012, AAAI.

[25]  Nicholas Rhinehart,et al.  First-Person Activity Forecasting with Online Inverse Reinforcement Learning , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[26]  Jan Peters,et al.  Relative Entropy Inverse Reinforcement Learning , 2011, AISTATS.

[27]  Stuart J. Russell Learning agents for uncertain environments (extended abstract) , 1998, COLT' 98.

[28]  Brett Browning,et al.  A survey of robot learning from demonstration , 2009, Robotics Auton. Syst..

[29]  Anind K. Dey,et al.  Maximum Entropy Inverse Reinforcement Learning , 2008, AAAI.

[30]  Andreas Krause,et al.  Near-Optimally Teaching the Crowd to Classify , 2014, ICML.