论文信息 - Counter example for Q-bucket-brigade under prediction problem

Counter example for Q-bucket-brigade under prediction problem

Aiming to clarify the convergence or divergence conditions for Learning Classifier System (LCS), this paper explores: (1) an extreme condition where the reinforcement process of LCS diverges; and (2) methods to avoid such divergence. Based on our previous work that showed equivalence between LCS's reinforcement process and Reinforcement Learning (RL) with Function approximation (FA) method, we present a counter-example for LCS with Q-bucket-brigade based on the 11-state star problem, a counter-example originally proposed to show the divergence of Q-learning with linear FA. Furthermore, the empirical results applying the counter-example to LCS verified the results predicted from the theory: (1) LCS with Q-bucket-brigade diverged under the prediction problem, where the action selection policy was fixed; and (2) such divergence was avoided by using implicit-bucket-brigade or applying residual gradient algorithm to Q-bucket-brigade.

[1] M. Pelikán,et al. Analyzing the evolutionary pressures in XCS , 2001 .

[2] Terrence J. Sejnowski,et al. TD(λ) Converges with Probability 1 , 1994, Machine Learning.

[3] Leemon C. Baird,et al. Residual Algorithms: Reinforcement Learning with Function Approximation , 1995, ICML.

[4] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.

[5] Stewart W. Wilson. ZCS: A Zeroth Level Classifier System , 1994, Evolutionary Computation.

[6] Larry Bull,et al. A memetic accuracy-based neural learning classifier system , 2005, 2005 IEEE Congress on Evolutionary Computation.

[7] John H. Holland,et al. Escaping brittleness: the possibilities of general-purpose learning algorithms applied to parallel rule-based systems , 1995 .

[8] Artur Merke,et al. Convergence of synchronous reinforcement learning with linear function approximation , 2004, ICML '04.

[9] Keiki Takadama,et al. Learning classifier system equivalent with reinforcement learning with function approximation , 2005, GECCO '05.

[10] Martin V. Butz,et al. Gradient descent methods in learning classifier systems: improving XCS performance in multistep problems , 2005, IEEE Transactions on Evolutionary Computation.

[11] Martin V. Butz,et al. Toward a theory of generalization and learning in XCS , 2004, IEEE Transactions on Evolutionary Computation.

[12] Osamu Katai,et al. Comparing Learning Classifier System and Reinforcement Learning with Function Approximation , 2004 .

[13] Jing Peng,et al. Efficient Learning and Planning Within the Dyna Framework , 1993, Adapt. Behav..

[14] Ben J. A. Kröse,et al. Learning from delayed rewards , 1995, Robotics Auton. Syst..

[15] Michael I. Jordan,et al. Reinforcement Learning with Soft State Aggregation , 1994, NIPS.

[16] John N. Tsitsiklis,et al. Analysis of temporal-difference learning with function approximation , 1996, NIPS 1996.

[17] Osamu Katai,et al. Learning Classifier System with Convergence and Generalization , 2005 .

[18] D. Goldberg,et al. Bounding Learning Time in XCS , 2004, GECCO.

[19] Michael I. Jordan,et al. MASSACHUSETTS INSTITUTE OF TECHNOLOGY ARTIFICIAL INTELLIGENCE LABORATORY and CENTER FOR BIOLOGICAL AND COMPUTATIONAL LEARNING DEPARTMENT OF BRAIN AND COGNITIVE SCIENCES , 1996 .

[20] Tommi S. Jaakkola,et al. Convergence Results for Single-Step On-Policy Reinforcement-Learning Algorithms , 2000, Machine Learning.

[21] Richard S. Sutton,et al. Introduction to Reinforcement Learning , 1998 .

[22] Geoffrey J. Gordon. Stable Function Approximation in Dynamic Programming , 1995, ICML.

[23] L. Baird. Reinforcement Learning Through Gradient Descent , 1999 .

[24] Stewart W. Wilson. Classifier Fitness Based on Accuracy , 1995, Evolutionary Computation.

[25] John N. Tsitsiklis,et al. Feature-based methods for large scale dynamic programming , 2004, Machine Learning.

[26] John N. Tsitsiklis,et al. Asynchronous Stochastic Approximation and Q-Learning , 1994, Machine Learning.

[27] Pier Luca Lanzi,et al. Learning classifier systems from a reinforcement learning perspective , 2002, Soft Comput..

[28] Marco Dorigo,et al. A comparison of Q-learning and classifier systems , 1994 .

[29] Lashon B. Booker. Adaptive value function approximations in classifier systems , 2005, GECCO '05.

[30] Richard S. Sutton,et al. Generalization in ReinforcementLearning : Successful Examples UsingSparse Coarse , 1996 .