Reinforcement learning of iterative behaviour with multiple sensors

Reinforcement learning allows an agent to be both reactive and adaptive, but it requires a simple yet consistent representation of the task environment. In robotics this representation is the product of perception. Perception is a powerful simplifying mechanism because it ignores much of the complexity of the world by mapping multiple world states to each of a few representational states. The constraint of consistency conflicts with simplicity, however. A consistent representation distinguishes world states that have distinct utilities, but perception systems with sufficient acuity to do this tend to also make many unnecessary distinctions.In this paper we discuss reinforcement learning and the problem of appropriate perception. We then investigate a method for dealing with the problem, called theLion algorithm [1], and show that it can be used to reduce complexity by decomposing perception. The Lion algorithm does not allow iterative rules to be learned, and we describe modifications that overcome this limitation. We present experimental results that demonstrate their effectiveness in further reducing complexity. Finally, we mention some related research, and conclude with suggestions for further work.

[1]  Rodney A. Brooks,et al.  Challenges for complete creature architectures , 1991 .

[2]  Rodney A. Brooks,et al.  Elephants don't play chess , 1990, Robotics Auton. Syst..

[3]  Ming Tan,et al.  Cost-Sensitive Reinforcement Learning for Adaptive Classification and Control , 1991, AAAI.

[4]  Douglas B. Lenat,et al.  Why AM and EURISKO Appear to Work , 1984, Artif. Intell..

[5]  Matthew T. Mason Kicking the Sensing Habit , 1993, AI Mag..

[6]  Sridhar Mahadevan,et al.  Automatic Programming of Behavior-Based Robots Using Reinforcement Learning , 1991, Artif. Intell..

[7]  Rodney A. Brooks,et al.  A Robust Layered Control Syste For A Mobile Robot , 2022 .

[8]  Chris Watkins,et al.  Learning from delayed rewards , 1989 .

[9]  David Chapman,et al.  Planning for Conjunctive Goals , 1987, Artif. Intell..

[10]  Richard S. Sutton,et al.  Learning to predict by the methods of temporal differences , 1988, Machine Learning.

[11]  David P. Miller A Twelve-Step Program to More Efficient Robotics , 1993, AI Mag..

[12]  Long Ji Lin,et al.  Self-improving reactive agents based on reinforcement learning, planning and teaching , 1992, Machine Learning.

[13]  Rodney A. Brooks,et al.  Intelligence Without Reason , 1991, IJCAI.

[14]  Jon Doyle,et al.  Rationality and its Roles in Reasoning (Extended Abstract) , 1990, AAAI.

[15]  R. James Firby,et al.  An Investigation into Reactive Planning in Complex Domains , 1987, AAAI.

[16]  Peter Dayan,et al.  Technical Note: Q-Learning , 2004, Machine Learning.

[17]  Reid G. Simmons,et al.  Sensible Planning: Focusing Perceptual Attention , 1991, AAAI.

[18]  Amy L. Lansky,et al.  Reactive Reasoning and Planning , 1987, AAAI.

[19]  Stuart E. Dreyfus,et al.  Applied Dynamic Programming , 1965 .

[20]  David Chapman,et al.  Pengi: An Implementation of a Theory of Activity , 1987, AAAI.

[21]  Dana H. Ballard,et al.  Learning to perceive and act by trial and error , 1991, Machine Learning.