论文信息 - Reinforcement learning of iterative behaviour with multiple sensors

Reinforcement learning of iterative behaviour with multiple sensors

Reinforcement learning allows an agent to be both reactive and adaptive, but it requires a simple yet consistent representation of the task environment. In robotics this representation is the product of perception. Perception is a powerful simplifying mechanism because it ignores much of the complexity of the world by mapping multiple world states to each of a few representational states. The constraint of consistency conflicts with simplicity, however. A consistent representation distinguishes world states that have distinct utilities, but perception systems with sufficient acuity to do this tend to also make many unnecessary distinctions.In this paper we discuss reinforcement learning and the problem of appropriate perception. We then investigate a method for dealing with the problem, called theLion algorithm [1], and show that it can be used to reduce complexity by decomposing perception. The Lion algorithm does not allow iterative rules to be learned, and we describe modifications that overcome this limitation. We present experimental results that demonstrate their effectiveness in further reducing complexity. Finally, we mention some related research, and conclude with suggestions for further work.

Abdul Sattar | Pushkar Piggott

[1] Rodney A. Brooks,et al. Challenges for complete creature architectures , 1991 .

[2] Rodney A. Brooks,et al. Elephants don't play chess , 1990, Robotics Auton. Syst..

[3] Ming Tan,et al. Cost-Sensitive Reinforcement Learning for Adaptive Classification and Control , 1991, AAAI.

[4] Douglas B. Lenat,et al. Why AM and EURISKO Appear to Work , 1984, Artif. Intell..

[5] Matthew T. Mason. Kicking the Sensing Habit , 1993, AI Mag..

[6] Sridhar Mahadevan,et al. Automatic Programming of Behavior-Based Robots Using Reinforcement Learning , 1991, Artif. Intell..

[7] Rodney A. Brooks,et al. A Robust Layered Control Syste For A Mobile Robot , 2022 .

[8] Chris Watkins,et al. Learning from delayed rewards , 1989 .

[9] David Chapman,et al. Planning for Conjunctive Goals , 1987, Artif. Intell..

[10] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.

[11] David P. Miller. A Twelve-Step Program to More Efficient Robotics , 1993, AI Mag..