论文信息 - Merging Individually Learned Optimal Results to Accelerate Coordination

Merging Individually Learned Optimal Results to Accelerate Coordination

By merging agents’ individually learned optimal value functions, agents can learn their optimal policies in a multiagent system. Pre-knowledge of the task is used to decompose it into several subtasks and this decomposition greatly reduces the state and action spaces. The optimal value functions of each subtask are learned by MAXQ-Q[1] algorithm. By defining the lower and upper bound on the value functions of the whole task, we propose a novel online multiagent learning algorithm LU-Q, and LU-Q accelerates learning of coordination between multiple agents by task decomposition and action pruning.

Huaxiang Zhang | Shangteng Huang | Huaxiang Zhang | Shangteng Huang

[1] Michael I. Jordan,et al. Learning with Mixtures of Trees , 2001, J. Mach. Learn. Res..

[2] Satinder P. Singh,et al. How to Dynamically Merge Markov Decision Processes , 1997, NIPS.

[3] Sridhar Mahadevan,et al. A multiagent reinforcement learning algorithm by dynamically merging markov decision processes , 2002, AAMAS '02.

[4] Michael P. Wellman,et al. Nash Q-Learning for General-Sum Stochastic Games , 2003, J. Mach. Learn. Res..

[5] Michael L. Littman,et al. Markov Games as a Framework for Multi-Agent Reinforcement Learning , 1994, ICML.

[6] Chris Watkins,et al. Learning from delayed rewards , 1989 .

[7] Keith B. Hall,et al. Correlated Q-Learning , 2003, ICML.

[8] Thomas G. Dietterich. Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition , 1999, J. Artif. Intell. Res..

[9] Craig Boutilier,et al. Sequential Optimality and Coordination in Multiagent Systems , 1999, IJCAI.