暂无分享,去创建一个
[1] Michael I. Jordan,et al. Is Q-learning Provably Efficient? , 2018, NeurIPS.
[2] Sham M. Kakade,et al. Optimality and Approximation with Policy Gradient Methods in Markov Decision Processes , 2019, COLT.
[3] Michael L. Littman,et al. Markov Games as a Framework for Multi-Agent Reinforcement Learning , 1994, ICML.
[4] Olivier Pietquin,et al. Actor-Critic Fictitious Play in Simultaneous Move Multistage Games , 2018, AISTATS.
[5] Michael P. Wellman,et al. Nash Q-Learning for General-Sum Stochastic Games , 2003, J. Mach. Learn. Res..
[6] Sepp Hochreiter,et al. GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium , 2017, NIPS.
[7] Gauthier Gidel,et al. A Variational Inequality Perspective on Generative Adversarial Networks , 2018, ICLR.
[8] J. Neumann. A Model of General Economic Equilibrium , 1945 .
[9] Michael H. Bowling,et al. Actor-Critic Policy Optimization in Partially Observable Multiagent Environments , 2018, NeurIPS.
[10] Dmitriy Drusvyatskiy,et al. Stochastic model-based minimization of weakly convex functions , 2018, SIAM J. Optim..
[11] Christoph Dann,et al. Sample Complexity of Episodic Fixed-Horizon Reinforcement Learning , 2015, NIPS.
[12] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.
[13] Constantinos Daskalakis,et al. Training GANs with Optimism , 2017, ICLR.
[14] Tamer Basar,et al. Multi-Agent Reinforcement Learning: A Selective Overview of Theories and Algorithms , 2019, Handbook of Reinforcement Learning and Control.
[15] Constantinos Daskalakis,et al. The Limit Points of (Optimistic) Gradient Descent in Min-Max Optimization , 2018, NeurIPS.
[16] Dmitriy Drusvyatskiy,et al. Stochastic subgradient method converges at the rate O(k-1/4) on weakly convex functions , 2018, ArXiv.
[17] Christos H. Papadimitriou,et al. Cycles in adversarial regularized learning , 2017, SODA.
[18] Chen-Yu Wei,et al. Online Reinforcement Learning in Stochastic Games , 2017, NIPS.
[19] Nan Jiang,et al. Information-Theoretic Considerations in Batch Reinforcement Learning , 2019, ICML.
[20] John N. Tsitsiklis,et al. Actor-Critic Algorithms , 1999, NIPS.
[21] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.
[22] Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.
[23] Mathias Staudigl,et al. Stochastic Mirror Descent Dynamics and Their Convergence in Monotone Variational Inequalities , 2017, Journal of Optimization Theory and Applications.
[24] Serdar Yüksel,et al. Decentralized Q-Learning for Stochastic Teams and Games , 2015, IEEE Transactions on Automatic Control.
[25] Niao He,et al. Global Convergence and Variance Reduction for a Class of Nonconvex-Nonconcave Minimax Problems , 2020, NeurIPS.
[26] Tengyuan Liang,et al. Interaction Matters: A Note on Non-asymptotic Local Convergence of Generative Adversarial Networks , 2018, AISTATS.
[27] L. Shapley,et al. Stochastic Games* , 1953, Proceedings of the National Academy of Sciences.
[28] Noah Golowich,et al. Last Iterate is Slower than Averaged Iterate in Smooth Convex-Concave Saddle Point Problems , 2020, COLT.
[29] Michael I. Jordan,et al. On Gradient Descent Ascent for Nonconvex-Concave Minimax Problems , 2019, ICML.
[30] Michael H. Bowling,et al. Regret Minimization in Games with Incomplete Information , 2007, NIPS.
[31] Tamer Basar,et al. Policy Optimization Provably Converges to Nash Equilibria in Zero-Sum Linear Quadratic Games , 2019, NeurIPS.
[32] Jacob Abernethy,et al. Last-iterate convergence rates for min-max optimization , 2019, ArXiv.
[33] Ioannis Mitliagkas,et al. A Tight and Unified Analysis of Extragradient for a Whole Spectrum of Differentiable Games , 2019, ArXiv.
[34] Rémi Munos,et al. Neural Replicator Dynamics , 2019, ArXiv.
[35] Guillaume J. Laurent,et al. Independent reinforcement learners in cooperative Markov games: a survey regarding coordination problems , 2012, The Knowledge Engineering Review.
[36] Haipeng Luo,et al. Linear Last-iterate Convergence for Matrix Games and Stochastic Games , 2020, ArXiv.
[37] Craig Boutilier,et al. The Dynamics of Reinforcement Learning in Cooperative Multiagent Systems , 1998, AAAI/IAAI.
[38] Chuan-Sheng Foo,et al. Optimistic mirror descent in saddle-point problems: Going the extra (gradient) mile , 2018, ICLR.
[39] Alec Radford,et al. Proximal Policy Optimization Algorithms , 2017, ArXiv.
[40] 丸山 徹. Convex Analysisの二,三の進展について , 1977 .
[41] F. Facchinei,et al. Finite-Dimensional Variational Inequalities and Complementarity Problems , 2003 .
[42] Zhuoran Yang,et al. A Theoretical Analysis of Deep Q-Learning , 2019, L4DC.
[43] Constantinos Daskalakis,et al. Last-Iterate Convergence: Zero-Sum Games and Constrained Min-Max Optimization , 2018, ITCS.
[44] Chi Jin,et al. Near-Optimal Reinforcement Learning with Self-Play , 2020, NeurIPS.
[45] Anne Condon,et al. On Algorithms for Simple Stochastic Games , 1990, Advances In Computational Complexity Theory.
[46] Sergey Levine,et al. End-to-End Training of Deep Visuomotor Policies , 2015, J. Mach. Learn. Res..
[47] Jalaj Bhandari,et al. Global Optimality Guarantees For Policy Gradient Methods , 2019, ArXiv.
[48] Michael I. Jordan,et al. Policy-Gradient Algorithms Have No Guarantees of Convergence in Linear Quadratic Games , 2019, AAMAS.
[49] Rémi Munos,et al. Minimax Regret Bounds for Reinforcement Learning , 2017, ICML.
[50] Jason D. Lee,et al. Solving a Class of Non-Convex Min-Max Games Using Iterative First Order Methods , 2019, NeurIPS.
[51] Prateek Jain,et al. Efficient Algorithms for Smooth Minimax Optimization , 2019, NeurIPS.
[52] Nikolai Matni,et al. On the Sample Complexity of the Linear Quadratic Regulator , 2017, Foundations of Computational Mathematics.
[53] Jan Peters,et al. Reinforcement learning in robotics: A survey , 2013, Int. J. Robotics Res..
[54] G. M. Korpelevich. The extragradient method for finding saddle points and other problems , 1976 .
[55] Niao He,et al. Global Convergence and Variance-Reduced Optimization for a Class of Nonconvex-Nonconcave Minimax Problems , 2020, ArXiv.
[56] Ronen I. Brafman,et al. R-MAX - A General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning , 2001, J. Mach. Learn. Res..
[57] Lillian J. Ratliff,et al. Global Convergence of Policy Gradient for Sequential Zero-Sum Linear Quadratic Dynamic Games , 2019, ArXiv.
[58] Mingrui Liu,et al. Non-Convex Min-Max Optimization: Provable Algorithms and Applications in Machine Learning , 2018, ArXiv.
[59] Pablo Hernandez-Leal,et al. A Survey of Learning in Multiagent Environments: Dealing with Non-Stationarity , 2017, ArXiv.
[60] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[61] Yongxin Chen,et al. Hybrid Block Successive Approximation for One-Sided Non-Convex Min-Max Problems: Algorithms and Applications , 2019, IEEE Transactions on Signal Processing.
[62] Weiwei Kong,et al. An Accelerated Inexact Proximal Point Method for Solving Nonconvex-Concave Min-Max Problems , 2019, SIAM J. Optim..
[63] Rémi Munos,et al. Error Bounds for Approximate Policy Iteration , 2003, ICML.
[64] Aryan Mokhtari,et al. A Unified Analysis of Extra-gradient and Optimistic Gradient Methods for Saddle Point Problems: Proximal Point Approach , 2019, AISTATS.
[65] Chi Jin,et al. Provable Self-Play Algorithms for Competitive Reinforcement Learning , 2020, ICML.
[66] Sham M. Kakade,et al. Model-Based Multi-Agent RL in Zero-Sum Markov Games with Near-Optimal Sample Complexity , 2020, NeurIPS.
[67] Constantinos Daskalakis,et al. Near-optimal no-regret algorithms for zero-sum games , 2011, SODA '11.
[68] Bart De Schutter,et al. A Comprehensive Survey of Multiagent Reinforcement Learning , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).
[69] Demis Hassabis,et al. Mastering the game of Go without human knowledge , 2017, Nature.
[70] Shie Mannor,et al. Optimistic Policy Optimization with Bandit Feedback , 2020, ICML.
[71] Shimon Whiteson,et al. Stabilising Experience Replay for Deep Multi-Agent Reinforcement Learning , 2017, ICML.
[72] Michael I. Jordan,et al. Policy-Gradient Algorithms Have No Guarantees of Convergence in Continuous Action and State Multi-Agent Settings , 2019, ArXiv.
[73] Wojciech M. Czarnecki,et al. Grandmaster level in StarCraft II using multi-agent reinforcement learning , 2019, Nature.
[74] Ronald J. Williams,et al. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.
[75] Qiaomin Xie,et al. Learning Zero-Sum Simultaneous-Move Markov Games Using Function Approximation and Correlated Equilibrium , 2020, COLT 2020.
[76] Mingrui Liu,et al. Solving Weakly-Convex-Weakly-Concave Saddle-Point Problems as Successive Strongly Monotone Variational Inequalities , 2018 .
[77] Ioannis Mitliagkas,et al. Negative Momentum for Improved Game Dynamics , 2018, AISTATS.
[78] Karl Tuyls,et al. Computing Approximate Equilibria in Sequential Adversarial Games by Exploitability Descent , 2019, IJCAI.