Learning to Use Selective Attention and Short-Term Memory in Sequential Tasks

This paper presents U-Tree, a reinforcement learning algorithm that uses selective attention and shortterm memory to simultaneously address the intertwined problems of large perceptual state spaces and hidden state. By combining the advantages of work in instance-based (or “memory-based”) learning and work with robust statistical tests for separating noise from task structure, the method learns quickly, creates only task-relevant state distinctions, and handles noise well. U-Tree uses a tree-structured representation, and is related to work on Prediction Suffix Trees [Ron et al., 1994], Parti-game [Moore, 1993], G-algorithm [Chapman and Kaelbling, 1991], and Variable Resolution Dynamic Programming [Moore, 1991]. It builds on Utile Suffix Memory [McCallum, 1995c], which only used short-term memory, not selective perception. The algorithm is demonstrated solving a highway driving task in which the agent weaves around slower and faster traffic. The agent uses active perception with simulated eye movements. The environment has hidden state, time pressure, stochasticity, over 21,000 world states and over 2,500 percepts. From this environment and sensory system, the agent uses a utile distinction test to build a tree that represents depththree memory where necessary, and has just 143 internal states—far fewer than the 25003 states that would have resulted from a fixed-sized history-window approach.

[1]  J. Davenport Editor , 1960 .

[2]  S. Ullman Visual routines , 1984, Cognition.

[3]  David Chapman,et al.  Pengi: An Implementation of a Theory of Activity , 1987, AAAI.

[4]  R. Lathe Phd by thesis , 1988, Nature.

[5]  David Chapman,et al.  Penguins Can Make Cake , 1989, AI Mag..

[6]  Andrew W. Moore,et al.  Memory-Based Reinforcement Learning: Efficient Computation with Prioritized Sweeping , 1992, NIPS.

[7]  Andrew McCallum,et al.  Overcoming Incomplete Perception with Utile Distinction Memory , 1993, ICML.

[8]  Marco C. Bettoni,et al.  Made-Up Minds: A Constructivist Approach to Artificial Intelligence , 1993, IEEE Expert.

[9]  J. Peng,et al.  Efficient Learning and Planning Within the Dyna Framework , 1993, IEEE International Conference on Neural Networks.

[10]  Dana Ron,et al.  Learning probabilistic automata with variable memory length , 1994, COLT '94.

[11]  Andrew McCallum,et al.  Instance-Based State Identification for Reinforcement Learning , 1994, NIPS.

[12]  Ben J. A. Kröse,et al.  Learning from delayed rewards , 1995, Robotics Auton. Syst..

[13]  Andrew McCallum,et al.  Instance-Based Utile Distinctions for Reinforcement Learning , 1995 .

[14]  J. R. Quinlan,et al.  MDL and Categorical Theories (Continued) , 1995, ICML.

[15]  Andrew McCallum,et al.  Reinforcement learning with selective perception and hidden state , 1996 .

[16]  Rajesh P. N. Rao,et al.  Embodiment is the foundation, not a level , 1996, Behavioral and Brain Sciences.

[17]  TD Ameritrade THE UNIVERSITY OF ROCHESTER , 1998 .