Anticipatory Classifier System with Average Reward Criterion in Discretized Multi-Step Environments

Initially, Anticipatory Classifier Systems (ACS) were designed to address both single and multistep decision problems. In the latter case, the objective was to maximize the total discounted rewards, usually based on Q-learning algorithms. Studies on other Learning Classifier Systems (LCS) revealed many real-world sequential decision problems where the preferred objective is the maximization of the average of successive rewards. This paper proposes a relevant modification toward the learning component, allowing us to address such problems. The modified system is called AACS2 (Averaged ACS2) and is tested on three multistep benchmark problems.

[1]  M. Puterman Chapter 8 Markov decision processes , 1990 .

[2]  Sridhar Mahadevan,et al.  Average reward reinforcement learning: Foundations, algorithms, and empirical results , 2004, Machine Learning.

[3]  Dan Xia,et al.  Learning classifier system with average reward reinforcement learning , 2013, Knowl. Based Syst..

[4]  Wojciech Zaremba,et al.  OpenAI Gym , 2016, ArXiv.

[5]  Stewart W. Wilson Classifier Fitness Based on Accuracy , 1995, Evolutionary Computation.

[6]  Satinder P. Singh,et al.  Reinforcement Learning Algorithms for Average-Payoff Markovian Decision Processes , 1994, AAAI.

[7]  Joachim Hoffmann,et al.  Lernmechanismen zum Erwerb verhaltenssteuernden Wissens , 2000 .

[8]  Sridhar Mahadevan,et al.  Sensitive Discount Optimality: Unifying Discounted and Average Reward Reinforcement Learning , 1996, ICML.

[9]  Martin V. Butz,et al.  Biasing Exploration in an Anticipatory Learning Classifier System , 2001, IWLCS.

[10]  Stewart W. Wilson ZCS: A Zeroth Level Classifier System , 1994, Evolutionary Computation.

[11]  John H. Holland,et al.  Escaping brittleness: the possibilities of general-purpose learning algorithms applied to parallel rule-based systems , 1995 .

[12]  Wolfgang Stolzmann,et al.  An Introduction to Anticipatory Classifier Systems , 1999, Learning Classifier Systems.

[13]  Jason H. Moore,et al.  ExSTraCS 2.0: description and evaluation of a scalable learning classifier system , 2015, Evolutionary Intelligence.

[14]  Keivan Borna,et al.  Customer satisfaction prediction with Michigan-style learning classifier system , 2019 .

[15]  Olgierd Unold,et al.  Introducing Action Planning to the Anticipatory Classifier System ACS2 , 2019, CORES.

[16]  Pierre Collet,et al.  BACS: A Thorough Study of Using Behavioral Sequences in ACS2 , 2020, PPSN.

[17]  Andrew W. Moore,et al.  Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..