论文信息 - Focus of Attention in Reinforcement Learning

Focus of Attention in Reinforcement Learning

Classification-based reinforcement learning (RL) methods have recently been proposed as an alternative to the traditional value-function based methods. These methods use a classifier to represent a policy, where the input (features) to the classifier is the state and the output (class label) for that state is the desired action. The reinforcement-learning community knows that focusing on more important states can lead to improved performance. In this paper, we investigate the idea of focused learning in the context of classification-based RL. Specifically, we define a useful notation of state importance, which we use to prove rigorous bounds on policy loss. Furthermore, we show that a classification-based RL agent may behave arbitrarily poorly if it treats all states as equally important.

Vadim Bulitko | Russell Greiner | Lihong Li

[1] Bruce A. Draper,et al. ADORE: Adaptive Object Recognition , 1999, ICVS.

[2] David G. Stork,et al. Pattern Classification , 1973 .

[3] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.

[4] Ronald A. Howard,et al. Dynamic Programming and Markov Processes , 1960 .

[5] Wei Zhang,et al. A Reinforcement Learning Approach to job-shop Scheduling , 1995, IJCAI.

[6] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[7] Russell Greiner,et al. Why Experimentation can be better than "Perfect Guidance" , 1997, ICML.

[8] Gerald Tesauro,et al. Practical Issues in Temporal Difference Learning , 1992, Mach. Learn..

[9] Gerald Tesauro,et al. Temporal difference learning and TD-Gammon , 1995, CACM.

[10] J. Baxter,et al. Direct gradient-based reinforcement learning , 2000, 2000 IEEE International Symposium on Circuits and Systems. Emerging Technologies for the 21st Century. Proceedings (IEEE Cat No.00CH36353).

[11] Richard S. Sutton,et al. Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming , 1990, ML.

[12] P. Bartlett,et al. Direct Gradient-Based Reinforcement Learning: II. Gradient Ascent Algorithms and Experiments , 1999 .

[13] Robert Givan,et al. Inductive Policy Selection for First-Order MDPs , 2002, UAI.

[14] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.

[15] Gerald Tesauro,et al. On-line Policy Improvement using Monte-Carlo Search , 1996, NIPS.

[16] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[17] Ben Tse,et al. Autonomous Inverted Helicopter Flight via Reinforcement Learning , 2004, ISER.

[18] David G. Stork,et al. Pattern classification, 2nd Edition , 2000 .

[19] John Langford,et al. Relating reinforcement learning performance to classification performance , 2005, ICML '05.

[20] Yishay Mansour,et al. Approximate Planning in Large POMDPs via Reusable Trajectories , 1999, NIPS.

[21] Vadim Bulitko,et al. Machine Learning for Adaptive Image Interpretation , 2004, AAAI.

[22] S. Shankar Sastry,et al. Autonomous Helicopter Flight via Reinforcement Learning , 2003, NIPS.