Credit assignment and discovery in classifier systems

Classifier systems are “discovery” production rule systems that utilize the genetic algorithm for discovery and allocate credit through the bucket brigade. For any given problem, the success of a classifier system depends on the choice of representation, the system's ability to attain reward or punishment states (evaluation states), accurate estimation of the relative merit of individual classifiers, and the genetic algorithm's ability to use information about the current population of rules to generate better rules. This article addresses the adequacy of the bucket brigade and backward averaging for credit assignment and reviews a preliminary study of two variants in conjunction with rules that are fully enumerated as well as with discovery. Potential difficulties with each of these methods are highlighted in several theoretical examples, including one from the literature. Preliminary results and tentative similarities between these hybrids and Sutton's Adaptive Heuristic Critic (AHC) are suggested.

[1]  Tom M. Mitchell,et al.  Learning from Solution Paths: An Approach to the Credit Assignment Problem , 1982, AI Mag..

[2]  Lashon B. Booker,et al.  Triggered Rule Discovery in Classifier Systems , 1989, ICGA.

[3]  Richard S. Sutton,et al.  Temporal credit assignment in reinforcement learning , 1984 .

[4]  Lashon B. Booker,et al.  Intelligent Behavior as an Adaptation to the Task Environment , 1982 .

[5]  Gunar E. Liepins,et al.  Machine learning applications to job shop scheduling , 1988, IEA/AIE '88.

[6]  John H. Holland,et al.  Empirical studies of default hierarchies and sequences of rules in learning classifier systems , 1988 .

[7]  A. L. Samuel,et al.  Some Studies in Machine Learning Using the Game of Checkers , 1967, IBM J. Res. Dev..

[8]  Gunar E. Liepins,et al.  Genetic algorithms: Foundations and applications , 1990 .

[9]  John H. Holland,et al.  Properties of the Bucket Brigade , 1985, ICGA.

[10]  James L. McClelland,et al.  Parallel distributed processing: explorations in the microstructure of cognition, vol. 1: foundations , 1986 .

[11]  Gunar E. Liepins,et al.  Alternatives for Classifier System Credit Assignment , 1989, IJCAI.

[12]  Ian H. Witten,et al.  An Adaptive Optimal Controller for Discrete-Time Markov Environments , 1977, Inf. Control..

[13]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[14]  A. L. Samuel,et al.  Some studies in machine learning using the game of checkers. II: recent progress , 1967 .

[15]  Richard S. Sutton,et al.  Neuronlike adaptive elements that can solve difficult learning control problems , 1983, IEEE Transactions on Systems, Man, and Cybernetics.

[16]  Stewart W. Wilson Hierarchical Credit Allocation in a Classifier System , 1987, IJCAI.

[17]  John H. Holland,et al.  COGNITIVE SYSTEMS BASED ON ADAPTIVE ALGORITHMS1 , 1978 .

[18]  Arthur L. Samuel,et al.  Some Studies in Machine Learning Using the Game of Checkers , 1967, IBM J. Res. Dev..

[19]  Stephen F. Smith,et al.  A learning system based on genetic adaptive algorithms , 1980 .

[20]  John H. Holland,et al.  Escaping brittleness: the possibilities of general-purpose learning algorithms applied to parallel rule-based systems , 1995 .

[21]  Tom M. Mitchell,et al.  Learning by experimentation: acquiring and refining problem-solving heuristics , 1993 .

[22]  Lawrence Davis,et al.  Genetic Algorithms and Simulated Annealing , 1987 .

[23]  Steven Edward Hampson,et al.  A neural model of adaptive behavior , 1983 .