On the Use of Non-Stationary Strategies for Solving Two-Player Zero-Sum Markov Games
暂无分享,去创建一个
Bruno Scherrer | Olivier Pietquin | Bilal Piot | Julien Pérolat | B. Scherrer | Bilal Piot | J. Pérolat | O. Pietquin
[1] Csaba Szepesvári,et al. Finite-Time Bounds for Fitted Value Iteration , 2008, J. Mach. Learn. Res..
[2] Dimitri P. Bertsekas,et al. Dynamic Programming and Optimal Control, Two Volume Set , 1995 .
[3] Rémi Munos,et al. Performance Bounds in Lp-norm for Approximate Value Iteration , 2007, SIAM J. Control. Optim..
[4] Bruno Scherrer,et al. On the Use of Non-Stationary Policies for Stationary Infinite-Horizon Markov Decision Processes , 2012, NIPS.
[5] K. I. M. McKinnon,et al. On the Generation of Markov Decision Processes , 1995 .
[6] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[7] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[8] J. Neumann,et al. Theory of games and economic behavior , 1945, 100 Years of Math Milestones.
[9] Jeff G. Schneider,et al. Policy Search by Dynamic Programming , 2003, NIPS.
[10] Bruno Scherrer,et al. Approximate Policy Iteration Schemes: A Comparison , 2014, ICML.
[11] Michail G. Lagoudakis,et al. Value Function Approximation in Zero-Sum Markov Games , 2002, UAI.
[12] Matthieu Geist,et al. Approximate Modified Policy Iteration , 2012, ICML.
[13] Csaba Szepesvári,et al. Fitted Q-iteration in continuous action-space MDPs , 2007, NIPS.
[14] Bruno Scherrer,et al. Approximate Dynamic Programming for Two-Player Zero-Sum Markov Games , 2015, ICML.
[15] L. Shapley,et al. Stochastic Games* , 1953, Proceedings of the National Academy of Sciences.
[16] Dimitri P. Bertsekas,et al. Stochastic shortest path games: theory and algorithms , 1997 .
[17] Michail G. Lagoudakis,et al. Least-Squares Policy Iteration , 2003, J. Mach. Learn. Res..
[18] Bart De Schutter,et al. A Comprehensive Survey of Multiagent Reinforcement Learning , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).
[19] David K. Smith,et al. Dynamic Programming and Optimal Control. Volume 1 , 1996 .
[20] Matthieu Geist,et al. Approximate Modied Policy Iteration , 2012 .