Reinforcement Learning with Perceptual Aliasing: The Perceptual Distinctions Approach

It is known that Perceptual Aliasing may significantly diminish the effectiveness of reinforcement learning algorithms [Whitehead and Ballard, 1991]. Perceptual aliasing occurs when multiple situations that are indistinguishable from immediate perceptual input require different responses from the system. For example, if a robot can only see forward, yet the presence of a battery charger behind it determines whether or not it should backup, immediate perception alone is insufficient for determining the most appropriate action. It is problematic since reinforcement algorithms typically learn a control policy from immediate perceptual input to the optimal choice of action. This paper introduces the predictive distinctions approach to compensate for perceptual aliasing caused from incomplete perception of the world. An additional component, a predictive model, is utilized to track aspects of the world that may not be visible at all times. In addition to the control policy, the model must also be learned, and to allow for stochastic actions and noisy perception, a probabilistic model is learned from experience. In the process, the system must discover, on its own, the important distinctions in the world. Experimental results are given for a simple simulated domain, and additional issues are discussed.

[1]  L. Baum,et al.  A Maximization Technique Occurring in the Statistical Analysis of Probabilistic Functions of Markov Chains , 1970 .

[2]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[3]  Richard S. Sutton,et al.  Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming , 1990, ML.

[4]  Leslie Pack Kaelbling,et al.  Input Generalization in Delayed Reinforcement Learning: An Algorithm and Performance Comparisons , 1991, IJCAI.

[5]  Lambert E. Wixson,et al.  Scaling Reinforcement Learning Techniques via Modularity , 1991, ML.

[6]  Steven D. Whitehead,et al.  A Complexity Analysis of Cooperative Mechanisms in Reinforcement Learning , 1991, AAAI.

[7]  Long Ji Lin,et al.  Programming Robots Using Reinforcement Learning and Teaching , 1991, AAAI.

[8]  Ming Tan,et al.  Cost-Sensitive Reinforcement Learning for Adaptive Classification and Control , 1991, AAAI.

[9]  José del R. Millán,et al.  Learning to Avoid Obstacles Through Reinforcement , 1991, ML.

[10]  W. Lovejoy A survey of algorithmic methods for partially observed Markov decision processes , 1991 .

[11]  Gary L. Drescher,et al.  Made-up minds - a constructivist approach to artificial intelligence , 1991 .

[12]  Rich Caruana,et al.  Intelligent Agent Design Issues: Internal Agent State and Incomplete Perception , 1991 .

[13]  S. Thrun Eecient Exploration in Reinforcement Learning , 1992 .

[14]  Michael I. Jordan,et al.  Forward Models: Supervised Learning with a Distal Teacher , 1992, Cogn. Sci..

[15]  Sridhar Mahadevan,et al.  Automatic Programming of Behavior-Based Robots Using Reinforcement Learning , 1991, Artif. Intell..

[16]  Sebastian Thrun,et al.  Efficient Exploration In Reinforcement Learning , 1992 .

[17]  Leslie Pack Kaelbling,et al.  Learning in embedded systems , 1993 .

[18]  R. Simmons,et al.  Complexity Analysis of , 1993 .