论文信息 - Value-function reinforcement learning in Markov games - 字舞流文

Value-function reinforcement learning in Markov games

Michael L. Littman | M. Littman

[1] Manuela Veloso,et al. An Analysis of Stochastic Game Theory for Multiagent Reinforcement Learning , 2000 .

[2] Ron Sun,et al. Rationality Assumptions and Optimality of Co-learning , 2000, PRIMA.

[3] Sandip Sen,et al. Evaluating concurrent reinforcement learners , 2000, Proceedings Fourth International Conference on MultiAgent Systems.

[4] Michael P. Wellman,et al. Experimental Results on Q-Learning for General-Sum Stochastic Games , 2000, ICML.

[5] Michael H. Bowling,et al. Convergence Problems of General-Sum Multiagent Reinforcement Learning , 2000, ICML.

[6] Csaba Szepesvári,et al. A Unified Analysis of Value-Function-Based Reinforcement-Learning Algorithms , 1999, Neural Computation.

[7] Michael P. Wellman,et al. Learning in dynamic noncooperative multiagent systems , 1999 .

[8] Michael P. Wellman,et al. Multiagent Reinforcement Learning: Theoretical Framework and an Algorithm , 1998, ICML.

[9] Craig Boutilier,et al. The Dynamics of Reinforcement Learning in Cooperative Multiagent Systems , 1998, AAAI/IAAI.

[10] V. Borkar. Asynchronous Stochastic Approximations , 1998 .

[11] Jerzy A. Filar,et al. Markov Decision Processes: The Noncompetitive Case , 1997 .

[12] Jerzy A. Filar,et al. Competitive Markov decision processes : with 57 illustrations , 1997 .

[13] Andrew W. Moore,et al. Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[14] Craig Boutilier,et al. Planning, Learning and Coordination in Multiagent Decision Processes , 1996, TARK.

[15] Csaba Szepesvári,et al. A Generalized Reinforcement-Learning Model: Convergence and Applications , 1996, ICML.

[16] Sandip Sen,et al. Learning to Coordinate without Sharing Information , 1994, AAAI.

[17] Michael L. Littman,et al. Markov Games as a Framework for Multi-Agent Reinforcement Learning , 1994, ICML.

[18] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[19] Michael I. Jordan,et al. MASSACHUSETTS INSTITUTE OF TECHNOLOGY ARTIFICIAL INTELLIGENCE LABORATORY and CENTER FOR BIOLOGICAL AND COMPUTATIONAL LEARNING DEPARTMENT OF BRAIN AND COGNITIVE SCIENCES , 1996 .

[20] Richard S. Sutton,et al. Learning and Sequential Decision Making , 1989 .

[21] C. Watkins. Learning from delayed rewards , 1989 .

[22] Stef Tijs,et al. Fictitious play applied to sequences of games and discounted stochastic games , 1982 .

[23] R. Milner. Mathematical Centre Tracts , 1976 .

[24] R. Howard. Dynamic Programming and Markov Processes , 1960 .

[25] L. Shapley,et al. Stochastic Games* , 1953, Proceedings of the National Academy of Sciences.

[26] J. Neumann,et al. Theory of games and economic behavior , 1945, 100 Years of Math Milestones.