Toward Guidelines for Modeling Learning Agents in Multiagent-Based Simulation: Implications from Q-Learning and Sarsa Agents

This paper focuses on how simulation results are sensitive to agent modeling in multiagent-based simulation (MABS) and investigates such sensitivity by comparing results where agents have different learning mechanisms, i.e., Q-learning and Sarsa, in the context of reinforcement learning. Through an analysis of simulation results in a bargaining game as one of the canonical examples in game theory, the following implications have been revealed: (1) even a slight difference has an essential influence on simulation results; (2) testing in static and dynamic environments highlights the different tendency of results; and (3) three stages in both Q-learning and Sarsa agents (i.e., (a) competition; (b) cooperation; and (c) learning impossible) are found in the dynamic environment, while no stage is found in the static environment. From these three implications, the following very rough guidelines for modeling agents can be derived: (1) cross-element validation for specifying key factors that affect simulation results; (2) a comparison of results between the static and dynamic environments for determining candidates to be investigated in detail; and (3) sensitive analysis for specifying applicable range for learning agents.

[1]  Ariel Rubinstein,et al.  A Course in Game Theory , 1995 .

[2]  A. Rubinstein Perfect Equilibrium in a Bargaining Model , 1982 .

[3]  S. Zamir,et al.  Bargaining and Market Behavior in Jerusalem, Ljubljana, Pittsburgh, and Tokyo: An Experimental Study , 1991 .

[4]  Norikazu Sugimoto,et al.  Cross-Element Validation in Multiagent-based Simulation: Switching Learning Mechanisms in Agents , 2003, J. Artif. Soc. Soc. Simul..

[5]  Sander van der Hoog,et al.  On Multi-Agent Based Simulation , 2004 .

[6]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .

[7]  Richard S. Sutton,et al.  Generalization in ReinforcementLearning : Successful Examples UsingSparse Coarse , 1996 .

[8]  Jaime Simão Sichman,et al.  Multi-Agent-Based Simulation , 2002, Lecture Notes in Computer Science.

[9]  D. E. Goldberg,et al.  Genetic Algorithms in Search , 1989 .

[10]  David E. Goldberg,et al.  Genetic Algorithms in Search Optimization and Machine Learning , 1988 .

[11]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[12]  A. Muthoo Bargaining Theory with Applications , 1999 .

[13]  W. Güth,et al.  An experimental analysis of ultimatum bargaining , 1982 .

[14]  Hugo Sonnenschein,et al.  A Further Test of Noncooperative Bargaining Theory: Comment , 1988 .

[15]  Robert L. Axtell,et al.  Aligning simulation models: A case study and results , 1996, Comput. Math. Organ. Theory.

[16]  G. Owen,et al.  Two-person bargaining: An experimental test of the Nash axioms , 1974 .

[17]  Peter Dayan,et al.  Technical Note: Q-Learning , 2004, Machine Learning.

[18]  Mahesan Niranjan,et al.  On-line Q-learning using connectionist systems , 1994 .

[19]  Keiki Takadama,et al.  Lessons Learned from Comparison Between Q-learning and Sarsa Agents in Bargaining Game , 2004 .

[20]  R. Axelrod Reviews book & software , 2022 .

[21]  Abhinay Muthoo,et al.  A Non-Technical Introduction to Bargaining Theory , 2000 .

[22]  Andrew M. Colman,et al.  The complexity of cooperation: Agent-based models of competition and collaboration , 1998, Complex..

[23]  Hans-Paul Schwefel,et al.  Evolutionary Programming and Evolution Strategies: Similarities and Differences , 1993 .

[24]  Peter Dayan,et al.  Q-learning , 1992, Machine Learning.