Summary of Proposal for Public Release

A vast number of problems of economic and scientific interest involve sequences of actions where the effects of one action influence the expected utility of subsequent actions. These sequential decision problems include such diverse applications as inventory management, the control of robots and industrial processes, playing backgammon, and planning under uncertainty, all of which are made more challenging because of their sequential and stochastic aspects. Many problems in robotics and artificial intelligence are also of this nature, as indeed are most of the decision-making and planning problems faced by people and animals in their daily lives. Reinforcement learning is a new body of theory and techniques for solving such sequential decision processes, based on classical methods such as dynamic programming and inspired by animal learning theory, that enables larger and more diverse problems to be solved. The objectives of my research are to create new methods for reinforcement learning that remove some of the limitations on their widespread application, and to develop reinforcement learning as a model of intelligence that could approach human abilities.

[1]  Roland J. Zito-Wolf,et al.  Learning search control knowledge: An explanation-based approach , 1991, Machine Learning.

[2]  Peter Stone,et al.  Learning Predictive State Representations , 2003, ICML.

[3]  Peter Stone,et al.  Scaling Reinforcement Learning toward RoboCup Soccer , 2001, ICML.

[4]  Richard S. Sutton,et al.  Predictive Representations of State , 2001, NIPS.

[5]  Herbert Jaeger,et al.  Observable Operator Models for Discrete Stochastic Time Series , 2000, Neural Computation.

[6]  Thomas G. Dietterich Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition , 1999, J. Artif. Intell. Res..

[7]  Doina Precup,et al.  Temporal abstraction in reinforcement learning , 2000, ICML 2000.

[8]  Doina Precup,et al.  Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[9]  R. Holte,et al.  A symbol's role in learning low-level control functions , 1999 .

[10]  Ralph Neuneier,et al.  Enhancing Q-Learning for Optimal Asset Allocation , 1997, NIPS.

[11]  Stuart J. Russell,et al.  Reinforcement Learning with Hierarchies of Machines , 1997, NIPS.

[12]  John N. Tsitsiklis,et al.  Analysis of Temporal-Diffference Learning with Function Approximation , 1996, NIPS.

[13]  Dimitri P. Bertsekas,et al.  Reinforcement Learning for Dynamic Channel Allocation in Cellular Telephone Systems , 1996, NIPS.

[14]  Andrew McCallum,et al.  Reinforcement learning with selective perception and hidden state , 1996 .

[15]  Andrew G. Barto,et al.  Improving Elevator Performance Using Reinforcement Learning , 1995, NIPS.

[16]  Gerald Tesauro,et al.  Temporal difference learning and TD-Gammon , 1995, CACM.

[17]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .