暂无分享,去创建一个
[1] Philip Wolfe,et al. Contributions to the theory of games , 1953 .
[2] J. Nash. Equilibrium Points in N-Person Games. , 1950, Proceedings of the National Academy of Sciences of the United States of America.
[3] J. Robinson. AN ITERATIVE METHOD OF SOLVING A GAME , 1951, Classics in Game Theory.
[4] L. Shapley,et al. Stochastic Games* , 1953, Proceedings of the National Academy of Sciences.
[5] H. W. Kuhn,et al. 11. Extensive Games and the Problem of Information , 1953 .
[6] A. M. Fink,et al. Equilibrium in a stochastic $n$-person game , 1964 .
[7] J. Goodman. Note on Existence and Uniqueness of Equilibrium Points for Concave N-Person Games , 1965 .
[8] Edward Gaughan,et al. Introduction to Analysis , 1969 .
[9] H. Simon,et al. From substantive to procedural rationality , 1976 .
[10] O. J. Vrieze,et al. Stochastic Games with Finite State and Action Spaces. , 1988 .
[11] L. C. Thomas,et al. Stochastic Games with Finite State and Action Spaces , 1988 .
[12] C. Watkins. Learning from delayed rewards , 1989 .
[13] Itzhak Gilboa,et al. Bounded Versus Unbounded Rationality: The Tyranny of the Weak , 1989 .
[14] Peter J. Jansen,et al. Using knowledge about the opponent in game-tree search , 1992 .
[15] Ronald J. Williams,et al. Tight Performance Bounds on Greedy Policies Based on Imperfect Value Functions , 1993 .
[16] Maja J. Mataric,et al. Reward Functions for Accelerated Learning , 1994, ICML.
[17] Michael L. Littman,et al. Markov Games as a Framework for Multi-Agent Reinforcement Learning , 1994, ICML.
[18] Michael I. Jordan,et al. Reinforcement Learning Algorithm for Partially Observable Markov Decision Problems , 1994, NIPS.
[19] Sandip Sen,et al. Learning to Coordinate without Sharing Information , 1994, AAAI.
[20] Ariel Rubinstein,et al. A Course in Game Theory , 1995 .
[21] Shlomo Zilberstein,et al. Models of Bounded Rationality , 1995 .
[22] Csaba Szepesvári,et al. A Generalized Reinforcement-Learning Model: Convergence and Applications , 1996, ICML.
[23] J. Filar,et al. Competitive Markov Decision Processes , 1996 .
[24] David Carmel,et al. Learning Models of Intelligent Agents , 1996, AAAI/IAAI, Vol. 1.
[25] T. Cormen,et al. Model-based Learning of Interaction Strategies in Multi-agent Systems , 1997 .
[26] A. Rubinstein. Modeling Bounded Rationality , 1998 .
[27] H. Kuhn. Classics in Game Theory , 1997 .
[28] A. Rubinstein,et al. Games with Procedurally Rational Players , 1997 .
[29] Milos Hauskrecht,et al. Hierarchical Solution of Markov Decision Processes using Macro-actions , 1998, UAI.
[30] Ian Frank,et al. Soccer Server: A Tool for Research on Multiagent Systems , 1998, Appl. Artif. Intell..
[31] Craig Boutilier,et al. The Dynamics of Reinforcement Learning in Cooperative Multiagent Systems , 1998, AAAI/IAAI.
[32] Doina Precup,et al. Intra-Option Learning about Temporally Abstract Actions , 1998, ICML.
[33] Michael P. Wellman,et al. Multiagent Reinforcement Learning: Theoretical Framework and an Algorithm , 1998, ICML.
[34] M. Veloso,et al. Bounding the suboptimality of reusing subproblems , 1999, IJCAI 1999.
[35] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.
[36] Andrew Y. Ng,et al. Policy Search via Density Estimation , 1999, NIPS.
[37] Manuela M. Veloso,et al. On Behavior Classification in Adversarial Environments , 2000, DARS.
[38] Peter L. Bartlett,et al. Reinforcement Learning in POMDP's via Direct Gradient Ascent , 2000, ICML.
[39] Andrew G. Barto,et al. Automatic Discovery of Subgoals in Reinforcement Learning using Diverse Density , 2001, ICML.
[40] Peter Stone,et al. Leading Best-Response Strategies in Repeated Games , 2001, International Joint Conference on Artificial Intelligence.
[41] Tuomas Sandholm,et al. Bargaining with limited computation: Deliberation equilibrium , 2001, Artif. Intell..
[42] Michael L. Littman,et al. Friend-or-Foe Q-learning in General-Sum Games , 2001, ICML.
[43] Manuela M. Veloso,et al. Multiagent learning using a variable learning rate , 2002, Artif. Intell..
[44] Manuela M. Veloso,et al. Planning for Distributed Execution through Use of Probabilistic Opponent Models , 2002, AIPS.
[45] Manuela Veloso,et al. Tree based hierarchical reinforcement learning , 2002 .
[46] William T. B. Uther,et al. Adversarial Reinforcement Learning , 2003 .
[47] Keith B. Hall,et al. Correlated Q-Learning , 2003, ICML.
[48] Tommi S. Jaakkola,et al. Convergence Results for Single-Step On-Policy Reinforcement-Learning Algorithms , 2000, Machine Learning.
[49] Andrew W. Moore,et al. Prioritized sweeping: Reinforcement learning with less data and less time , 2004, Machine Learning.
[50] Sridhar Mahadevan,et al. Average reward reinforcement learning: Foundations, algorithms, and empirical results , 2004, Machine Learning.
[51] SRIDHAR MAHADEVAN,et al. Average Reward Reinforcement Learning: Foundations, Algorithms, and Empirical Results , 2005, Machine Learning.