This paper investigates the use of experience generalization on concurrent and on-line policy learning in multi-agent scenarios, using reinforcement learning algorithms. Agents learning concurrently implies in a non-stationary scenario, since the reward received by one agent (for applying an action in a state) depends on the behavior of the other agents. Non-stationary scenarios can be viewed as a two-player game in which an agent and the other player (which represents the other agents and the environment) select actions from the available actions in the current state; these actions define the possible next state. An RL algorithm that can be applied to such a scenario is the Minimax-Q algorithm, which is known to guarantee convergence to equilibrium in the limit. However, finding optimal control policies using any RL algorithm Minimax-Q included) can be very time consuming. We investigate the use of experience generalization for increasing the rate of convergence of RL algorithms, and contribute a new learning algorithm, Minimax-QS, which incorporates experience generalization to the Minimax-Q algorithm. We also prove its convergence to Minimax-Q values under suitable conditions.
[1]
Michael I. Jordan,et al.
Reinforcement Learning with Soft State Aggregation
,
1994,
NIPS.
[2]
Michael L. Littman,et al.
Markov Games as a Framework for Multi-Agent Reinforcement Learning
,
1994,
ICML.
[3]
Gerald Tesauro,et al.
Temporal Difference Learning and TD-Gammon
,
1995,
J. Int. Comput. Games Assoc..
[4]
John N. Tsitsiklis,et al.
Analysis of temporal-difference learning with function approximation
,
1996,
NIPS 1996.
[5]
Bikramjit Banerjee,et al.
Fast Concurrent Reinforcement Learners
,
2001,
IJCAI.
[6]
Carlos H. C. Ribeiro,et al.
Embedding a Priori Knowledge in Reinforcement Learning
,
1998,
J. Intell. Robotic Syst..
[7]
Csaba Szepesv Ari,et al.
Generalized Markov Decision Processes: Dynamic-programming and Reinforcement-learning Algorithms
,
1996
.
[8]
Samuel Karlin,et al.
A First Course on Stochastic Processes
,
1968
.
[9]
Csaba Szepesvári,et al.
A Generalized Reinforcement-Learning Model: Convergence and Applications
,
1996,
ICML.