Counter example for Q-bucket-brigade under prediction problem

Aiming to clarify the convergence or divergence conditions for Learning Classifier System (LCS), this paper explores: (1) an extreme condition where the reinforcement process of LCS diverges; and (2) methods to avoid such divergence. Based on our previous work that showed equivalence between LCS's reinforcement process and Reinforcement Learning (RL) with Function approximation (FA) method, we present a counter-example for LCS with Q-bucket-brigade based on the 11-state star problem, a counter-example originally proposed to show the divergence of Q-learning with linear FA. Furthermore, the empirical results applying the counter-example to LCS verified the results predicted from the theory: (1) LCS with Q-bucket-brigade diverged under the prediction problem, where the action selection policy was fixed; and (2) such divergence was avoided by using implicit-bucket-brigade or applying residual gradient algorithm to Q-bucket-brigade.

[1]  M. Pelikán,et al.  Analyzing the evolutionary pressures in XCS , 2001 .

[2]  Terrence J. Sejnowski,et al.  TD(λ) Converges with Probability 1 , 1994, Machine Learning.

[3]  Leemon C. Baird,et al.  Residual Algorithms: Reinforcement Learning with Function Approximation , 1995, ICML.

[4]  Richard S. Sutton,et al.  Learning to predict by the methods of temporal differences , 1988, Machine Learning.

[5]  Stewart W. Wilson ZCS: A Zeroth Level Classifier System , 1994, Evolutionary Computation.

[6]  Larry Bull,et al.  A memetic accuracy-based neural learning classifier system , 2005, 2005 IEEE Congress on Evolutionary Computation.

[7]  John H. Holland,et al.  Escaping brittleness: the possibilities of general-purpose learning algorithms applied to parallel rule-based systems , 1995 .

[8]  Artur Merke,et al.  Convergence of synchronous reinforcement learning with linear function approximation , 2004, ICML '04.

[9]  Keiki Takadama,et al.  Learning classifier system equivalent with reinforcement learning with function approximation , 2005, GECCO '05.

[10]  Martin V. Butz,et al.  Gradient descent methods in learning classifier systems: improving XCS performance in multistep problems , 2005, IEEE Transactions on Evolutionary Computation.

[11]  Martin V. Butz,et al.  Toward a theory of generalization and learning in XCS , 2004, IEEE Transactions on Evolutionary Computation.

[12]  Osamu Katai,et al.  Comparing Learning Classifier System and Reinforcement Learning with Function Approximation , 2004 .

[13]  Jing Peng,et al.  Efficient Learning and Planning Within the Dyna Framework , 1993, Adapt. Behav..

[14]  Ben J. A. Kröse,et al.  Learning from delayed rewards , 1995, Robotics Auton. Syst..

[15]  Michael I. Jordan,et al.  Reinforcement Learning with Soft State Aggregation , 1994, NIPS.

[16]  John N. Tsitsiklis,et al.  Analysis of temporal-difference learning with function approximation , 1996, NIPS 1996.

[17]  Osamu Katai,et al.  Learning Classifier System with Convergence and Generalization , 2005 .

[18]  D. Goldberg,et al.  Bounding Learning Time in XCS , 2004, GECCO.

[19]  Michael I. Jordan,et al.  MASSACHUSETTS INSTITUTE OF TECHNOLOGY ARTIFICIAL INTELLIGENCE LABORATORY and CENTER FOR BIOLOGICAL AND COMPUTATIONAL LEARNING DEPARTMENT OF BRAIN AND COGNITIVE SCIENCES , 1996 .

[20]  Tommi S. Jaakkola,et al.  Convergence Results for Single-Step On-Policy Reinforcement-Learning Algorithms , 2000, Machine Learning.

[21]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .

[22]  Geoffrey J. Gordon Stable Function Approximation in Dynamic Programming , 1995, ICML.

[23]  L. Baird Reinforcement Learning Through Gradient Descent , 1999 .

[24]  Stewart W. Wilson Classifier Fitness Based on Accuracy , 1995, Evolutionary Computation.

[25]  John N. Tsitsiklis,et al.  Feature-based methods for large scale dynamic programming , 2004, Machine Learning.

[26]  John N. Tsitsiklis,et al.  Asynchronous Stochastic Approximation and Q-Learning , 1994, Machine Learning.

[27]  Pier Luca Lanzi,et al.  Learning classifier systems from a reinforcement learning perspective , 2002, Soft Comput..

[28]  Marco Dorigo,et al.  A comparison of Q-learning and classifier systems , 1994 .

[29]  Lashon B. Booker Adaptive value function approximations in classifier systems , 2005, GECCO '05.

[30]  Richard S. Sutton,et al.  Generalization in ReinforcementLearning : Successful Examples UsingSparse Coarse , 1996 .