论文信息 - Default hierarchy formation and memory exploitation in learning classifier systems

Default hierarchy formation and memory exploitation in learning classifier systems

Automated adaptation in a general setting remains a difficult and poorly understood problem. Reinforcement learning control problems model environments where an automated system must optimize a reinforcement signal by providing inputs to a black-box whose internal structure is initially unknown and persistently uncertain. Learning classifier systems (LCSs) are a class of rule-based systems for reinforcement learning control that use genetic algorithms (GAs) for rule discovery. Genetic algorithms are a class of computerized search procedures whose mechanics are based on natural genetics. This study examines two characteristic aspects of LCSs: default hierarchy formation and memory exploitation. Default hierarchies are sets of rules where the utilities of partially correct, but broadly applicable rules (defaults) are augmented by additional rules (exceptions). By forming default hierarchies, an LCS can store knowledge in parsimonious rule sets that can be incrementally refined. To do this, an LCS must have conflict resolution mechanisms that cause exceptions to consistently override defaults. This study examines typical LCS conflict resolution mechanisms and shows that they are inadequate in many situations. A new conflict resolution strategy called the priority tuning scheme is introduced. Experimentation shows that this scheme properly organizes default hierarchies in situations where traditional schemes fail. Analysis reveals that this technique greatly enlarges the class of exploitable default hierarchies. LCSs have the potential to adaptively exploit memory and extend their capabilities beyond simple stimulus-response behavior. This study develops a class of problems that isolate memory exploitation from other aspects of LCS behavior. Experiments show that an LCS can form rule sets that exploit memory. However, the LCS does not form optimal rule sets because of a limitation in its allocation of credit scheme. This study demonstrates this limitation and suggests an alternate scheme that automatically evolves multi-rule corporations as a remedy. Preliminary analysis illustrates the potential of this method. LCSs are a promising approach to reinforcement learning. This study has suggested several directions for refinement and improved understanding of LCSs. Further development of learning systems like LCSs should extend the applicability of automatic systems to tasks that currently require human intervention.

Robert Elliott Smith