Active Perception and Reinforcement Learning

This paper considers adaptive control architectures that integrate active sensorimotor systems with decision systems based on reinforcement learning. One unavoidable consequence of active perception is that the agent's internal representation often confounds external world states. We call this phenomenon perceptual aliasing and show that it destabilizes existing reinforcement learning algorithms with respect to the optimal decision policy. A new decision system that overcomes these difficulties is described. The system incorporates a perceptual subcycle within the overall decision cycle and uses a modified learning algorithm to suppress the effects of perceptual aliasing. The result is a control architecture that learns not only how to solve a task but also where to focus its attention in order to collect necessary sensory information.

[1]  Richard Fikes,et al.  STRIPS: A New Approach to the Application of Theorem Proving to Problem Solving , 1971, IJCAI.

[2]  Richard Fikes,et al.  Learning and Executing Generalized Robot Plans , 1993, Artif. Intell..

[3]  Lashon B. Booker,et al.  Intelligent Behavior as an Adaptation to the Task Environment , 1982 .

[4]  Richard S. Sutton,et al.  Neuronlike adaptive elements that can solve difficult learning control problems , 1983, IEEE Transactions on Systems, Man, and Cybernetics.

[5]  Richard S. Sutton,et al.  Temporal credit assignment in reinforcement learning , 1984 .

[6]  Rodney A. Brooks,et al.  A Robust Layered Control Syste For A Mobile Robot , 2022 .

[7]  John H. Holland,et al.  Escaping brittleness: the possibilities of general-purpose learning algorithms applied to parallel rule-based systems , 1995 .

[8]  R. James Firby,et al.  An Investigation into Reactive Planning in Complex Domains , 1987, AAAI.

[9]  Stewart W. Wilson Hierarchical Credit Allocation in a Classifier System , 1987, IJCAI.

[10]  Marcel Schoppers,et al.  Universal Plans for Reactive Robots in Unpredictable Environments , 1987, IJCAI.

[11]  Amy L. Lansky,et al.  Reactive Reasoning and Planning , 1987, AAAI.

[12]  David Chapman,et al.  Pengi: An Implementation of a Theory of Activity , 1987, AAAI.

[13]  Philip E. Agre,et al.  The dynamic structure of everyday life , 1988 .

[14]  Richard S. Sutton,et al.  Sequential Decision Problems and Neural Networks , 1989, NIPS 1989.

[15]  D. Ballard,et al.  A Role for Anticipation in Reactive Systems that Learn , 1989, ML.

[16]  C. Watkins Learning from delayed rewards , 1989 .

[17]  Charles W. Anderson Tower of Hanoi with Connectionist Networks: Learning New Features , 1989, ML.

[18]  Leslie Pack Kaelbling A Formal Framework for Learning in Embedded Systems , 1989, ML.

[19]  David Chapman,et al.  Penguins Can Make Cake , 1989, AI Mag..

[20]  Tom M. Mitchell,et al.  On Becoming Reactive , 1989, ML.

[21]  Richard S. Sutton,et al.  Learning and Sequential Decision Making , 1989 .

[22]  Dana H. Ballard,et al.  Reactive behavior, learning, and anticipation , 1989 .

[23]  Mark Drummond,et al.  Situated Control Rules , 1989, KR.

[24]  Marcel Joachim Schoppers,et al.  Representation and automatic synthesis of reaction plans , 1989 .

[25]  Andrew W. Moore,et al.  Some experiments in adaptive state-space robotics , 1989 .

[26]  Richard S. Sutton,et al.  Neural networks for control , 1990 .

[27]  Richard S. Sutton,et al.  Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming , 1990, ML.

[28]  Richard S. Sutton,et al.  Dyna, an integrated architecture for learning, planning, and reacting , 1990, SGAR.