暂无分享,去创建一个
[1] Chi Jin,et al. Provable Self-Play Algorithms for Competitive Reinforcement Learning , 2020, ICML.
[2] Saeid Nahavandi,et al. Deep Reinforcement Learning for Multiagent Systems: A Review of Challenges, Solutions, and Applications , 2018, IEEE Transactions on Cybernetics.
[3] Tamer Basar,et al. Non-Cooperative Inverse Reinforcement Learning , 2019, NeurIPS.
[4] Yuxin Chen,et al. Breaking the Sample Size Barrier in Model-Based Reinforcement Learning with a Generative Model , 2020, NeurIPS.
[5] Tamer Basar,et al. Fully Decentralized Multi-Agent Reinforcement Learning with Networked Agents , 2018, ICML.
[6] Michael H. Bowling,et al. Actor-Critic Policy Optimization in Partially Observable Multiagent Environments , 2018, NeurIPS.
[7] William H. Sandholm,et al. Learning in Games via Reinforcement and Regularization , 2014, Math. Oper. Res..
[8] Mengdi Wang,et al. Feature-Based Q-Learning for Two-Player Stochastic Games , 2019, ArXiv.
[9] George B. Dantzig,et al. Linear programming and extensions , 1965 .
[10] Hilbert J. Kappen,et al. On the Sample Complexity of Reinforcement Learning with a Generative Model , 2012, ICML.
[11] Wotao Yin,et al. Does Knowledge Transfer Always Help to Learn a Better Policy? , 2019, ArXiv.
[12] Lihong Li,et al. Reinforcement Learning in Finite MDPs: PAC Analysis , 2009, J. Mach. Learn. Res..
[13] Kaiqing Zhang,et al. Finite-Sample Analysis For Decentralized Batch Multi-Agent Reinforcement Learning With Networked Agents. , 2018 .
[14] Chi Jin,et al. Near-Optimal Reinforcement Learning with Self-Play , 2020, NeurIPS.
[15] Mengdi Wang,et al. Randomized Linear Programming Solves the Discounted Markov Decision Problem In Nearly-Linear (Sometimes Sublinear) Running Time , 2017, 1704.01869.
[16] Haipeng Luo,et al. Fast Convergence of Regularized Learning in Games , 2015, NIPS.
[17] Rémi Munos,et al. Minimax Regret Bounds for Reinforcement Learning , 2017, ICML.
[18] Olivier Pietquin,et al. Actor-Critic Fictitious Play in Simultaneous Move Multistage Games , 2018, AISTATS.
[19] Michael P. Wellman,et al. Nash Q-Learning for General-Sum Stochastic Games , 2003, J. Mach. Learn. Res..
[20] Paul W. Goldberg,et al. Learning equilibria of games via payoff queries , 2013, EC '13.
[21] Christoph Dann,et al. Sample Complexity of Episodic Fixed-Horizon Reinforcement Learning , 2015, NIPS.
[22] Enrique Mallada,et al. The Role of Convexity in Saddle-Point Dynamics: Lyapunov Function and Robustness , 2016, IEEE Transactions on Automatic Control.
[23] Zhuoran Yang,et al. A Theoretical Analysis of Deep Q-Learning , 2019, L4DC.
[24] Bart De Schutter,et al. A Comprehensive Survey of Multiagent Reinforcement Learning , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).
[25] Ofir Nachum,et al. Path Consistency Learning in Tsallis Entropy Regularized MDPs , 2018, ICML.
[26] Bruno Scherrer,et al. On the Use of Non-Stationary Strategies for Solving Two-Player Zero-Sum Markov Games , 2016, AISTATS.
[27] F. Facchinei,et al. Finite-Dimensional Variational Inequalities and Complementarity Problems , 2003 .
[28] Demis Hassabis,et al. Mastering the game of Go without human knowledge , 2017, Nature.
[29] Lin F. Yang,et al. Model-Based Reinforcement Learning with a Generative Model is Minimax Optimal , 2019, COLT 2020.
[30] S. Vajda,et al. GAMES AND DECISIONS; INTRODUCTION AND CRITICAL SURVEY. , 1958 .
[31] E. Rowland. Theory of Games and Economic Behavior , 1946, Nature.
[32] Benjamin Van Roy,et al. Model-based Reinforcement Learning and the Eluder Dimension , 2014, NIPS.
[33] Matthieu Geist,et al. A Theory of Regularized Markov Decision Processes , 2019, ICML.
[34] Tamer Basar,et al. Multi-Agent Reinforcement Learning: A Selective Overview of Theories and Algorithms , 2019, Handbook of Reinforcement Learning and Control.
[35] Chen-Yu Wei,et al. Online Reinforcement Learning in Stochastic Games , 2017, NIPS.
[36] L. Buşoniu,et al. A comprehensive survey of multi-agent reinforcement learning , 2011 .
[37] M. J. M. Jansen,et al. Regularity and Stability of Equilibrium Points of Bimatrix Games , 1981, Math. Oper. Res..
[38] Sham M. Kakade,et al. On the Theory of Policy Gradient Methods: Optimality, Approximation, and Distribution Shift , 2019, J. Mach. Learn. Res..
[39] Vicenç Gómez,et al. A unified view of entropy-regularized Markov decision processes , 2017, ArXiv.
[40] Sham M. Kakade,et al. Optimality and Approximation with Policy Gradient Methods in Markov Decision Processes , 2019, COLT.
[41] Michael L. Littman,et al. Markov Games as a Framework for Multi-Agent Reinforcement Learning , 1994, ICML.
[42] Harold R. Parks,et al. The Implicit Function Theorem , 2002 .
[43] Sergey Levine,et al. Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.
[44] Ariel Rubinstein,et al. A Course in Game Theory , 1995 .
[45] John Fearnley,et al. Finding Approximate Nash Equilibria of Bimatrix Games via Payoff Queries , 2013, ACM Trans. Economics and Comput..
[46] Devavrat Shah,et al. On Reinforcement Learning for Turn-based Zero-sum Markov Games , 2020, FODS.
[47] Keith B. Hall,et al. Correlated Q-Learning , 2003, ICML.
[48] Tor Lattimore,et al. PAC Bounds for Discounted MDPs , 2012, ALT.
[49] Michal Valko,et al. Planning in entropy-regularized Markov decision processes and games , 2019, NeurIPS.
[50] Michael I. Jordan,et al. Is Q-learning Provably Efficient? , 2018, NeurIPS.
[51] Sham M. Kakade,et al. On the sample complexity of reinforcement learning. , 2003 .
[52] Akshay Krishnamurthy,et al. Reward-Free Exploration for Reinforcement Learning , 2020, ICML.
[53] Martin Grötschel,et al. The ellipsoid method and its consequences in combinatorial optimization , 1981, Comb..
[54] Narendra Karmarkar,et al. A new polynomial-time algorithm for linear programming , 1984, Comb..
[55] Lin F. Yang,et al. Solving Discounted Stochastic Two-Player Games with Near-Optimal Time and Sample Complexity , 2019, AISTATS.
[56] Amnon Shashua,et al. Safe, Multi-Agent, Reinforcement Learning for Autonomous Driving , 2016, ArXiv.
[57] Peter Auer,et al. Near-optimal Regret Bounds for Reinforcement Learning , 2008, J. Mach. Learn. Res..
[58] Bruno Scherrer,et al. Approximate Dynamic Programming for Two-Player Zero-Sum Markov Games , 2015, ICML.
[59] L. Shapley,et al. Stochastic Games* , 1953, Proceedings of the National Academy of Sciences.
[60] Stephen D. Patek,et al. Stochastic and shortest path games: theory and algorithms , 1997 .
[61] Ronen I. Brafman,et al. R-MAX - A General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning , 2001, J. Mach. Learn. Res..
[62] Andrew W. Moore,et al. Variable Resolution Discretization for High-Accuracy Solutions of Optimal Control Problems , 1999, IJCAI.
[63] Mengdi Wang,et al. Randomized Linear Programming Solves the Discounted Markov Decision Problem In Nearly-Linear Running Time , 2017, ArXiv.
[64] Michael Kearns,et al. Finite-Sample Convergence Rates for Q-Learning and Indirect Algorithms , 1998, NIPS.
[65] Vijay Janapa Reddi,et al. Deep Reinforcement Learning for Cyber Security , 2019, IEEE Transactions on Neural Networks and Learning Systems.
[66] Tengyuan Liang,et al. Interaction Matters: A Note on Non-asymptotic Local Convergence of Generative Adversarial Networks , 2018, AISTATS.
[67] Michael L. Littman,et al. Friend-or-Foe Q-learning in General-Sum Games , 2001, ICML.
[68] Qiaomin Xie,et al. Learning Zero-Sum Simultaneous-Move Markov Games Using Function Approximation and Correlated Equilibrium , 2020, COLT 2020.
[69] Dimitri P. Bertsekas,et al. Stochastic shortest path games: theory and algorithms , 1997 .
[70] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.
[71] Xian Wu,et al. Near-Optimal Time and Sample Complexities for Solving Markov Decision Processes with a Generative Model , 2018, NeurIPS.
[72] Tamer Basar,et al. Policy Optimization Provably Converges to Nash Equilibria in Zero-Sum Linear Quadratic Games , 2019, NeurIPS.
[73] H. Raiffa,et al. Games and Decisions: Introduction and Critical Survey. , 1958 .
[74] Lacra Pavel,et al. On the Properties of the Softmax Function with Application in Game Theory and Reinforcement Learning , 2017, ArXiv.
[75] Nesa L'abbe Wu,et al. Linear programming and extensions , 1981 .
[76] Olivier Pietquin,et al. Learning Nash Equilibrium for General-Sum Markov Games from Batch Data , 2016, AISTATS.