Incorporating Advice into Agents that Learn from Reinforcements

Learning from reinforcements is a promising approach for creating intelligent agents. However, reinforcement learning usually requires a large number of training episodes. We present an approach that addresses this shortcoming by allowing a connectionist Q-learner to accept advice given, at any time and in a natural manner, by an external observer. In our approach, the advice-giver watches the learner and occasionally makes suggestions, expressed as instructions in a simple programming language. Based on techniques from knowledge-based neural networks, these programs axe inserted directly into the agent's utility function. Subsequent reinforcement learning further integrates and refines the advice. We present empirical evidence that shows our approach leads to statistically-significant gains in expected reward. Importantly, the advice improves the expected reward regardless of the stage of training at which it is given.

[1]  John McCarthy,et al.  Programs with common sense , 1960 .

[2]  Earl D. Sacerdoti,et al.  Planning in a Hierarchy of Abstraction Spaces , 1974, IJCAI.

[3]  Philip Klahr,et al.  Advice-Taking and Knowledge Refinement: An Iterative View of Skill Acquisition , 1980 .

[4]  Tom Michael Mitchell,et al.  Explanation-based generalization: A unifying view , 1986 .

[5]  이종원,et al.  Explanation - Based Generalization 의 문제점 및 이의 해결방안 , 1986 .

[6]  Leslie Pack Kaelbling Rex: A Symbolic Language for the Design and Parallel Implementation of Embedded Systems , 1987 .

[7]  Allen Newell,et al.  SOAR: An Architecture for General Intelligence , 1987, Artif. Intell..

[8]  Joachim Diederich "Learning by Instruction" in connectionist Systems , 1989, ML.

[9]  A. Barto,et al.  Learning and Sequential Decision Making , 1989 .

[10]  Limin Fu Integration of neural heuristics into knowledge-based inference , 1989, International 1989 Joint Conference on Neural Networks.

[11]  H. Penny Nii,et al.  The Handbook of Artificial Intelligence , 1982 .

[12]  Jude Shavlik,et al.  Refinement ofApproximate Domain Theories by Knowledge-Based Neural Networks , 1990, AAAI.

[13]  Michael Hucka,et al.  Correcting and Extending Domain Knowledge using Outside Guidance , 1990, ML.

[14]  Jeffrey L. Elman,et al.  Finding Structure in Time , 1990, Cogn. Sci..

[15]  David Chapman,et al.  Vision, instruction, and action , 1990 .

[16]  Rodney A. Brooks,et al.  The Behavior Language: User''s Guide , 1990 .

[17]  Thomas G. Dietterich Knowledge Compilation: Bridging the Gap between Specification and Implementation , 1991 .

[18]  Steven D. Whitehead,et al.  A Complexity Analysis of Cooperative Mechanisms in Reinforcement Learning , 1991, AAAI.

[19]  Erann Gat ALFA: a language for programming reactive robotic control systems , 1991, Proceedings. 1991 IEEE International Conference on Robotics and Automation.

[20]  Richard S. Sutton,et al.  Reinforcement learning architectures for animats , 1991 .

[21]  P. Suppes Language for humans and robots , 1991 .

[22]  Paul E. Utgoff,et al.  Two Kinds of Training Information For Evaluation Function Learning , 1991, AAAI.

[23]  Hamid R. Berenji,et al.  Learning and tuning fuzzy logic controllers through reinforcements , 1992, IEEE Trans. Neural Networks.

[24]  Sridhar Mahadevan,et al.  Automatic Programming of Behavior-Based Robots Using Reinforcement Learning , 1991, Artif. Intell..

[25]  Paul E. Utgoff,et al.  A Teaching Method for Reinforcement Learning , 1992, ML.

[26]  Geoffrey E. Hinton,et al.  Simplifying Neural Networks by Soft Weight-Sharing , 1992, Neural Computation.

[27]  C. Lee Giles,et al.  Training Second-Order Recurrent Neural Networks using Hints , 1992, ML.

[28]  Reinforcement Learning Architectures , 1992 .

[29]  Devika Subramanian,et al.  A Multistrategy Learning Scheme for Agent Knowledge Acquisition , 1993, Informatica.

[30]  John E. Laird,et al.  Learning Procedures from Interactive Natural Language Instructions , 1993, ICML.

[31]  Sebastian Thrun,et al.  Integrating Inductive Neural Network Learning and Explanation-Based Learning , 1993, IJCAI.

[32]  Long Ji Lin,et al.  Scaling Up Reinforcement Learning for Robot Control , 1993, International Conference on Machine Learning.

[33]  Hava T. Siegelmann Neural Programming Language , 1994, AAAI.

[34]  Nils J. Nilsson,et al.  Teleo-Reactive Programs for Agent Control , 1993, J. Artif. Intell. Res..

[35]  Shuqing Zeng,et al.  Learning and tuning fuzzy logic controllers through genetic algorithm , 1994, Proceedings of 1994 IEEE International Conference on Neural Networks (ICNN'94).

[36]  Ben J. A. Kröse,et al.  Learning from delayed rewards , 1995, Robotics Auton. Syst..