Active Reward Learning

While reward functions are an essential component of many robot learning methods, defining such functions remains a hard problem in many practical applications. For tasks such as grasping, there are no reliable success measures available. Defining reward functions by hand requires extensive task knowledge and often leads to undesired emergent behavior. Instead, we propose to learn the reward function through active learning, querying human expert knowledge for a subset of the agent’s rollouts. We introduce a framework, wherein a traditional learning algorithm interplays with the reward learning component, such that the evolution of the action learner guides the queries of the reward learner. We demonstrate results of our method on a robot grasping task and show that the learned reward function generalizes to a similar task.

[1]  W. R. Garner,et al.  The effect of presenting various numbers of discrete steps on scale reading accuracy. , 1951, Journal of experimental psychology.

[2]  G. A. Miller The magical number seven plus or minus two: some limits on our capacity for processing information. , 1956, Psychological review.

[3]  Marco Colombetti,et al.  Robot Shaping: Developing Autonomous Agents Through Learning , 1994, Artif. Intell..

[4]  Andrew Y. Ng,et al.  Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping , 1999, ICML.

[5]  Jun Nakanishi,et al.  Learning Attractor Landscapes for Learning Motor Primitives , 2002, NIPS.

[6]  Ben Tse,et al.  Autonomous Inverted Helicopter Flight via Reinforcement Learning , 2004, ISER.

[7]  Wei Chu,et al.  Preference learning with Gaussian processes , 2005, ICML.

[8]  Shie Mannor,et al.  Reinforcement learning with Gaussian processes , 2005, ICML.

[9]  J. Andrew Bagnell,et al.  Maximum margin planning , 2006, ICML.

[10]  Andrew G. Barto,et al.  Autonomous shaping: knowledge transfer in reinforcement learning , 2006, ICML.

[11]  Mohammad Ghavamzadeh,et al.  Bayesian Policy Gradient Algorithms , 2006, NIPS.

[12]  Anind K. Dey,et al.  Maximum Entropy Inverse Reinforcement Learning , 2008, AAAI.

[13]  S. Ounpraseuth Gaussian Processes for Machine Learning. Carl Edward Rasmussen and Christopher K. I. Williams , 2008 .

[14]  Andrea Lockerd Thomaz,et al.  Teachable robots: Understanding human teaching behavior to build more effective robot learners , 2008, Artif. Intell..

[15]  Betty J. Mohler,et al.  Learning perceptual coupling for motor primitives , 2008, 2008 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[16]  Peter Stone,et al.  Interactively shaping agents via human reinforcement: the TAMER framework , 2009, K-CAP '09.

[17]  David Silver,et al.  Learning to search: Functional gradient techniques for imitation learning , 2009, Auton. Robots.

[18]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[19]  Manuel Lopes,et al.  Active Learning for Reward Estimation in Inverse Reinforcement Learning , 2009, ECML/PKDD.

[20]  Andreas Krause,et al.  Information-Theoretic Regret Bounds for Gaussian Process Optimization in the Bandit Setting , 2009, IEEE Transactions on Information Theory.

[21]  Darwin G. Caldwell,et al.  Robot motor skill coordination with EM-based Reinforcement Learning , 2010, 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[22]  Richard L. Lewis,et al.  Intrinsically Motivated Reinforcement Learning: An Evolutionary Perspective , 2010, IEEE Transactions on Autonomous Mental Development.

[23]  Oliver Kroemer,et al.  Combining active learning and reactive control for robot grasping , 2010, Robotics Auton. Syst..

[24]  Yasemin Altun,et al.  Relative Entropy Policy Search , 2010 .

[25]  Michèle Sebag,et al.  Preference-Based Policy Learning , 2011, ECML/PKDD.

[26]  Eyke Hüllermeier,et al.  Preference-Based Policy Iteration: Leveraging Preference Learning for Reinforcement Learning , 2011, ECML/PKDD.

[27]  Nando de Freitas,et al.  Portfolio Allocation for Bayesian Optimization , 2010, UAI.

[28]  Sam Devlin,et al.  Dynamic potential-based reward shaping , 2012, AAMAS.

[29]  Maya Cakmak,et al.  Designing robot learners that ask good questions , 2012, 2012 7th ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[30]  Ling Xu,et al.  Physical Human Interactive Guidance: Identifying Grasping Principles From Human-Planned Grasps , 2012, IEEE Trans. Robotics.

[31]  Peter K. Allen,et al.  Learning grasp stability , 2012, 2012 IEEE International Conference on Robotics and Automation.

[32]  Richard L. Lewis,et al.  Strong mitigation: nesting search for good policies within search for good reward , 2012, AAMAS.

[33]  Michèle Sebag,et al.  Interactive Robot Education , 2013 .

[34]  Andrea Lockerd Thomaz,et al.  Policy Shaping: Integrating Human Feedback with Reinforcement Learning , 2013, NIPS.

[35]  Oliver Kroemer,et al.  Learning sequential motor tasks , 2013, 2013 IEEE International Conference on Robotics and Automation.

[36]  Thorsten Joachims,et al.  Learning Trajectory Preferences for Manipulators via Iterative Improvement , 2013, NIPS.