Approximate Dynamic Programming for Two-Player Zero-Sum Markov Games
暂无分享,去创建一个
Bruno Scherrer | Olivier Pietquin | Bilal Piot | Julien Pérolat | B. Scherrer | Bilal Piot | J. Pérolat | O. Pietquin
[1] David K. Smith,et al. Dynamic Programming and Optimal Control. Volume 1 , 1996 .
[2] Michail G. Lagoudakis,et al. Least-Squares Policy Iteration , 2003, J. Mach. Learn. Res..
[3] Bart De Schutter,et al. A Comprehensive Survey of Multiagent Reinforcement Learning , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).
[4] Bruno Scherrer,et al. On the Use of Non-Stationary Policies for Stationary Infinite-Horizon Markov Decision Processes , 2012, NIPS.
[5] Dimitri P. Bertsekas,et al. Dynamic Programming and Optimal Control, Two Volume Set , 1995 .
[6] Leo Breiman,et al. Classification and Regression Trees , 1984 .
[7] Michail G. Lagoudakis,et al. Value Function Approximation in Zero-Sum Markov Games , 2002, UAI.
[8] J. Wal. Discounted Markov games: Generalized policy iteration method , 1978 .
[9] Alessandro Lazaric,et al. Analysis of a Classification-based Policy Iteration Algorithm , 2010, ICML.
[10] E. Kandel,et al. Proceedings of the National Academy of Sciences of the United States of America. Annual subject and author indexes. , 1990, Proceedings of the National Academy of Sciences of the United States of America.
[11] Narendra Karmarkar,et al. A new polynomial-time algorithm for linear programming , 1984, STOC '84.
[12] Bruno Scherrer,et al. Classification-based Policy Iteration with a Critic , 2011, ICML.
[13] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[14] Pierre Geurts,et al. Tree-Based Batch Mode Reinforcement Learning , 2005, J. Mach. Learn. Res..
[15] Peter Bro Miltersen,et al. Strategy Iteration Is Strongly Polynomial for 2-Player Turn-Based Stochastic Games with a Constant Discount Factor , 2010, JACM.
[16] Matthieu Geist,et al. Approximate Modified Policy Iteration , 2012, ICML.
[17] Manuela M. Veloso,et al. Rational and Convergent Learning in Stochastic Games , 2001, IJCAI.
[18] Csaba Szepesvári,et al. Fitted Q-iteration in continuous action-space MDPs , 2007, NIPS.
[19] L. Shapley,et al. Stochastic Games* , 1953, Proceedings of the National Academy of Sciences.
[20] J. Neumann,et al. Theory of games and economic behavior , 1945, 100 Years of Math Milestones.
[21] Keith B. Hall,et al. Correlated Q-Learning , 2003, ICML.
[22] Dimitri P. Bertsekas,et al. Stochastic shortest path games: theory and algorithms , 1997 .
[23] Csaba Szepesvári,et al. Error Propagation for Approximate Policy and Value Iteration , 2010, NIPS.
[24] Michael P. Wellman,et al. Nash Q-Learning for General-Sum Stochastic Games , 2003, J. Mach. Learn. Res..
[25] Csaba Szepesvári,et al. Finite-Time Bounds for Fitted Value Iteration , 2008, J. Mach. Learn. Res..
[26] Michael L. Littman,et al. Markov Games as a Framework for Multi-Agent Reinforcement Learning , 1994, ICML.
[27] Michail G. Lagoudakis,et al. Reinforcement Learning as Classification: Leveraging Modern Classifiers , 2003, ICML.
[28] Jean-Gabriel Ganascia,et al. Learning Strategies in Games by Anticipation , 1997, IJCAI.