Contextual Control Policy SelectionJe erson

Every autonomous agent operating in realistic settings must deal with incomplete state information. Sensory limitations, due to hardware constraints and/or limited interpretation algorithms, introduce hidden states which can prevent the acquisition of optimal control policies for a given task. This paper addresses this problem within the dynamical systems framework. The idea is to treat the agent in its environment as a dynamical system, and augment the original state space using contextual cues extracted empirically as the agent exercises existing control policies. Contex-tual cues are provided by the correlation between dynamic features of the agent-environment interaction and agent performance. In principle, the augmented state space allows the agent to learn to exploit control policies whenever they are likely to succeed, gather more information if the agent's context is ambiguous , and explore control alternatives in unfavorable contexts. Initial experiments involving an agent with impoverished sensing capabilities in a simulated, dynamic environment suggest that this approach might ooer a relatively simple way to extract meaningful context information from dynamical systems, and to enhance the agent's performance.

[1]  V. Braitenberg Vehicles, Experiments in Synthetic Psychology , 1984 .

[2]  Russell H. Taylor,et al.  Automatic Synthesis of Fine-Motion Strategies for Robots , 1984 .

[3]  Roderic A. Grupen,et al.  Learning Control Composition in a Complex Environment , 1996 .

[4]  Sridhar Mahadevan,et al.  Automatic Programming of Behavior-Based Robots Using Reinforcement Learning , 1991, Artif. Intell..

[5]  Andrew McCallum,et al.  Reinforcement learning with selective perception and hidden state , 1996 .

[6]  J. A. Coelho,et al.  A control basis for learning multifingered grasps , 1997, J. Field Robotics.

[7]  R. Andrew Hidden State and Reinforcement Learning with Instance-Based State Identification , 1996 .

[8]  Jürgen Schmidhuber,et al.  Reinforcement Learning with Self-Modifying Policies , 1998, Learning to Learn.

[9]  Roderic A. Grupen,et al.  A feedback control structure for on-line learning tasks , 1997, Robotics Auton. Syst..

[10]  J. A. Coelho,et al.  A Control Basis for Learning Multifingered Grasps , 1997 .

[11]  Christopher G. Atkeson,et al.  Using locally weighted regression for robot learning , 1991, Proceedings. 1991 IEEE International Conference on Robotics and Automation.

[12]  R. Andrew McCallum,et al.  Hidden state and reinforcement learning with instance-based state identification , 1996, IEEE Trans. Syst. Man Cybern. Part B.

[13]  Roderic A. Grupen,et al.  Distributed Control Representation for Manipulation Tasks , 1995, IEEE Expert.

[14]  Long-Ji Lin,et al.  Reinforcement learning for robots using neural networks , 1992 .

[15]  Maja J. Mataric,et al.  Reinforcement Learning in the Multi-Robot Domain , 1997, Auton. Robots.

[16]  Daniel E. Koditschek,et al.  Distributed real-time control of a spatial robot juggler , 1992, Computer.

[17]  Richard S. Sutton,et al.  Neuronlike adaptive elements that can solve difficult learning control problems , 1983, IEEE Transactions on Systems, Man, and Cybernetics.

[18]  C. Watkins Learning from delayed rewards , 1989 .

[19]  Andrew G. Barto,et al.  Shaping as a method for accelerating reinforcement learning , 1992, Proceedings of the 1992 IEEE International Symposium on Intelligent Control.

[20]  Leslie Pack Kaelbling,et al.  Learning Policies for Partially Observable Environments: Scaling Up , 1997, ICML.