Learning from Reinforcement and Advice Using Composite Reward Functions

Reinforcement learning has become a widely used methodology for creating intelligent agents in a wide range of applications. However, its performance deteriorates in tasks with sparse feedback or lengthy inter-reinforcement times. This paper presents an extension that makes use of an advisory entity to provide additional feedback to the agent. The agent incorporates both the rewards provided by the environment and the advice to attain faster learning speed, and policies that are tuned towards the preferences of the advisor while still achieving the underlying task objective. The advice is converted to “tuning” or user rewards that, together with the task rewards, define a composite reward function that more accurately defines the advisor’s perception of the task. At the same time, the formation of erroneous loops due to incorrect user rewards is avoided using formal bounds on the user reward component. This approach is illustrated using a robot navigation task.

[1]  John McCarthy,et al.  Programs with common sense , 1960 .

[2]  Philip Klahr,et al.  Advice-Taking and Knowledge Refinement: An Iterative View of Skill Acquisition , 1980 .

[3]  Jude W. Shavlik,et al.  Incorporating Advice into Agents that Learn from Reinforcements , 1994, AAAI.

[4]  R. A. Grupen,et al.  Harmonic control (robot applications) , 1992, Proceedings of the 1992 IEEE International Symposium on Intelligent Control.

[5]  Roderic A. Grupen,et al.  The applications of harmonic functions to robotics , 1993, J. Field Robotics.

[6]  R. Grupen,et al.  Harmonic Control , 1992 .

[7]  Andrew W. Moore,et al.  Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[8]  Long Ji Lin,et al.  Self-improving reactive agents based on reinforcement learning, planning and teaching , 1992, Machine Learning.

[9]  Sandip Sen,et al.  Evolution and learning in multiagent systems , 1998, Int. J. Hum. Comput. Stud..

[10]  C. I. Connolly,et al.  Applications of harmonic functions to robotics , 1992, Proceedings of the 1992 IEEE International Symposium on Intelligent Control.

[11]  Sandip Sen IJCAI-95 Workshop on Adaptation and Learning in Multiagent Systems , 1996 .

[12]  Ben J. A. Kröse,et al.  Learning from delayed rewards , 1995, Robotics Auton. Syst..