Online Reinforcement Learning in Stochastic Games
暂无分享,去创建一个
[1] L. Shapley,et al. Stochastic Games* , 1953, Proceedings of the National Academy of Sciences.
[2] Arie Hordijk,et al. Dynamic programming and Markov potential theory , 1974 .
[3] A. Federgruen. On N-person stochastic games by denumerable state space , 1978, Advances in Applied Probability.
[4] J. Wal,et al. Successive approximations for average reward Markov games , 1980 .
[5] J. Hunter. Generalized inverses and their application to applied probability problems , 1982 .
[6] J. Hunter,et al. Stationary Distributions and Mean First Passage Times of Perturbed Markov Chains , 1992 .
[7] Michael I. Jordan,et al. MASSACHUSETTS INSTITUTE OF TECHNOLOGY ARTIFICIAL INTELLIGENCE LABORATORY and CENTER FOR BIOLOGICAL AND COMPUTATIONAL LEARNING DEPARTMENT OF BRAIN AND COGNITIVE SCIENCES , 1996 .
[8] Michael L. Littman,et al. Markov Games as a Framework for Multi-Agent Reinforcement Learning , 1994, ICML.
[9] C. D. Meyer,et al. Markov chain sensitivity measured by mean first passage times , 2000 .
[10] Manuela M. Veloso,et al. Rational and Convergent Learning in Stochastic Games , 2001, IJCAI.
[11] Michail G. Lagoudakis,et al. Value Function Approximation in Zero-Sum Markov Games , 2002, UAI.
[12] Ronen I. Brafman,et al. R-MAX - A General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning , 2001, J. Mach. Learn. Res..
[13] Sham M. Kakade,et al. On the sample complexity of reinforcement learning. , 2003 .
[14] Laurent El Ghaoui,et al. Robust Control of Markov Decision Processes with Uncertain Transition Matrices , 2005, Oper. Res..
[15] Garud Iyengar,et al. Robust Dynamic Programming , 2005, Math. Oper. Res..
[16] Vincent Conitzer,et al. AWESOME: A general multiagent learning algorithm that converges in self-play and learns a best response against stationary opponents , 2003, Machine Learning.
[17] Peter Auer,et al. Logarithmic Online Regret Bounds for Undiscounted Reinforcement Learning , 2006, NIPS.
[18] Peter Auer,et al. Near-optimal Regret Bounds for Reinforcement Learning , 2008, J. Mach. Learn. Res..
[19] Yishay Mansour,et al. Online Markov Decision Processes , 2009, Math. Oper. Res..
[20] Massimiliano Pontil,et al. Empirical Bernstein Bounds and Sample-Variance Penalization , 2009, COLT.
[21] Ambuj Tewari,et al. REGAL: A Regularization based Algorithm for Reinforcement Learning in Weakly Communicating MDPs , 2009, UAI.
[22] Aleksandrs Slivkins,et al. 25th Annual Conference on Learning Theory The Best of Both Worlds: Stochastic and Adversarial Bandits , 2022 .
[23] András György,et al. The adversarial stochastic shortest path problem with unknown transition probabilities , 2012, AISTATS.
[24] Tor Lattimore,et al. PAC Bounds for Discounted MDPs , 2012, ALT.
[25] Peter L. Bartlett,et al. Online Learning in Markov Decision Processes with Adversarially Chosen Transition Probability Distributions , 2013, NIPS.
[26] András György,et al. Online Learning in Markov Decision Processes with Changing Cost Sequences , 2014, ICML.
[27] Shalabh Bhatnagar,et al. Two-Timescale Algorithms for Learning Nash Equilibria in General-Sum Stochastic Games , 2015, AAMAS.
[28] Bruno Scherrer,et al. Approximate Dynamic Programming for Two-Player Zero-Sum Markov Games , 2015, ICML.
[29] Christoph Dann,et al. Sample Complexity of Episodic Fixed-Horizon Reinforcement Learning , 2015, NIPS.
[30] Shie Mannor,et al. Reinforcement Learning in Robust Markov Decision Processes , 2013, Math. Oper. Res..
[31] Wouter M. Koolen,et al. Maximin Action Identification: A New Bandit Framework for Games , 2016, COLT.