An Adaptive Approach for the Exploration-Exploitation Dilemma for Learning Agents

Learning agents have to deal with the exploration-exploitation dilemma. The choice between exploration and exploitation is very difficult in dynamic systems; in particular in large scale ones such as economic systems. Recent research shows that there is neither an optimal nor a unique solution for this problem. In this paper, we propose an adaptive approach based on meta-rules to adapt the choice between exploration and exploitation. This new adaptive approach relies on the variations of the performance of the agents. To validate the approach, we apply it to economic systems and compare it to two adaptive methods: one local and one global. Herein, we adapt these two methods, which were originally proposed by Wilson, to economic systems. Moreover, we compare different exploration strategies and focus on their influence on the performance of the agents.

[1]  Peter Dayan,et al.  Technical Note: Q-Learning , 2004, Machine Learning.

[2]  Martin V. Butz,et al.  An algorithmic description of XCS , 2000, Soft Comput..

[3]  Terence C. Fogarty,et al.  Social Simulation Using a Multi-agent Model Based on Classifier Systems: The Emergence of Vacillating Behaviour in the "El Farol" Bar Problem , 2001, IWLCS.

[4]  Donald A. Sofge,et al.  Handbook of Intelligent Control: Neural, Fuzzy, and Adaptive Approaches , 1992 .

[5]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[6]  Rina Azoulay-Schwartz,et al.  Exploitation vs. exploration: choosing a supplier in an environment of incomplete information , 2004, Decis. Support Syst..

[7]  Andrew W. Moore,et al.  Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[8]  Peter Dayan,et al.  Q-learning , 1992, Machine Learning.

[9]  David Carmel,et al.  Exploration Strategies for Model-based Learning in Multi-agent Systems: Exploration Strategies , 1999, Autonomous Agents and Multi-Agent Systems.

[10]  Tim Kovacs,et al.  Advances in Learning Classifier Systems , 2001, Lecture Notes in Computer Science.

[11]  Marco Wiering,et al.  Explorations in efficient reinforcement learning , 1999 .

[12]  Paul Bourgine,et al.  Exploration of Multi-State Environments: Local Measures and Back-Propagation of Uncertainty , 1999, Machine Learning.

[13]  D. Sofge THE ROLE OF EXPLORATION IN LEARNING CONTROL , 1992 .

[14]  Sebastian Thrun,et al.  The role of exploration in learning control , 1992 .

[15]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .

[16]  Pattie Maes,et al.  Explore/Exploit Strategies in Autonomy , 1996 .

[17]  Stewart W. Wilson Classifier Fitness Based on Accuracy , 1995, Evolutionary Computation.

[18]  J. R. Moore,et al.  The theory of the growth of the firm twenty-five years after , 1960 .

[19]  J. Bather,et al.  Multi‐Armed Bandit Allocation Indices , 1990 .