论文信息 - An optimization-based categorization of reinforcement learning environments

An optimization-based categorization of reinforcement learning environments

1 This paper proposes a categorization of reinforcement learning environments based on the optimization of a reinforcement signal over time. Environments are classiied by the simplest agent that can possibly achieve optimal reinforcement. Two parameters, h and , abstractly characterize the complexity of an agent: the ideal (h,)-agent uses the input information provided by the environment and at most h bits of local storage to choose an action that maximizes the discounted sum of the next reinforcements. In an (h,)-environment, an ideal (h,)-agent achieves the maximum possible expected reinforcement for that environment. The paper discusses the special cases when either h = 0 or = 1 in detail, describes some theoretical bounds on h and and re-explores a well-known reinforcement learning environment with this new notation.

Michael L. Littman | M. Littman

[1] Richard S. Sutton,et al. Temporal credit assignment in reinforcement learning , 1984 .

[2] E. Kalai,et al. Finite Rationality and Interpersonal Complexity in Repeated Games , 1988 .

[3] David H. Ackley,et al. Generalization and Scaling in Reinforcement Learning , 1989, NIPS.

[4] Robert B. Allen,et al. Adaptive training for connectionist state machines , 1989, CSC '89.

[5] Richard S. Sutton,et al. Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming , 1990, ML.

[6] Stewart W. Wilson. The animat path to AI , 1991 .

[7] David H. Ackley,et al. Interactions between learning and evolution , 1991 .

[8] Richard S. Sutton,et al. Planning by Incremental Dynamic Programming , 1991, ML.

[9] Zoubin Ghahramani,et al. Temporal processing with connectionist networks , 1991, IJCNN-91-Seattle International Joint Conference on Neural Networks.

[10] David H. Ackley,et al. Adaptation in Constant Utility Non-Stationary Environments , 1991, ICGA.

[11] S. Thrun. Eecient Exploration in Reinforcement Learning , 1992 .

[12] Alan F. Murray,et al. International Joint Conference on Neural Networks , 1993 .

[13] Ben J. A. Kröse,et al. Learning from delayed rewards , 1995, Robotics Auton. Syst..