论文信息 - Experience generalization for concurrent reinforcement learners: the minimax-QS algorithm

Experience generalization for concurrent reinforcement learners: the minimax-QS algorithm

This paper investigates the use of experience generalization on concurrent and on-line policy learning in multi-agent scenarios, using reinforcement learning algorithms. Agents learning concurrently implies in a non-stationary scenario, since the reward received by one agent (for applying an action in a state) depends on the behavior of the other agents. Non-stationary scenarios can be viewed as a two-player game in which an agent and the other player (which represents the other agents and the environment) select actions from the available actions in the current state; these actions define the possible next state. An RL algorithm that can be applied to such a scenario is the Minimax-Q algorithm, which is known to guarantee convergence to equilibrium in the limit. However, finding optimal control policies using any RL algorithm Minimax-Q included) can be very time consuming. We investigate the use of experience generalization for increasing the rate of convergence of RL algorithms, and contribute a new learning algorithm, Minimax-QS, which incorporates experience generalization to the Minimax-Q algorithm. We also prove its convergence to Minimax-Q values under suitable conditions.

Anna Helena Reali Costa | Carlos H. C. Ribeiro | Renê Pegoraro | C. Ribeiro | R. Pegoraro

[1] Michael I. Jordan,et al. Reinforcement Learning with Soft State Aggregation , 1994, NIPS.

[2] Michael L. Littman,et al. Markov Games as a Framework for Multi-Agent Reinforcement Learning , 1994, ICML.

[3] Gerald Tesauro,et al. Temporal Difference Learning and TD-Gammon , 1995, J. Int. Comput. Games Assoc..

[4] John N. Tsitsiklis,et al. Analysis of temporal-difference learning with function approximation , 1996, NIPS 1996.

[5] Bikramjit Banerjee,et al. Fast Concurrent Reinforcement Learners , 2001, IJCAI.

[6] Carlos H. C. Ribeiro,et al. Embedding a Priori Knowledge in Reinforcement Learning , 1998, J. Intell. Robotic Syst..

[7] Csaba Szepesv Ari,et al. Generalized Markov Decision Processes: Dynamic-programming and Reinforcement-learning Algorithms , 1996 .

[8] Samuel Karlin,et al. A First Course on Stochastic Processes , 1968 .

[9] Csaba Szepesvári,et al. A Generalized Reinforcement-Learning Model: Convergence and Applications , 1996, ICML.