Global Convergence of Policy Gradient for Linear-Quadratic Mean-Field Control/Game in Continuous Time
暂无分享,去创建一个
Zhaoran Wang | Weichen Wang | Zhuoran Yang | Jiequn Han | Zhaoran Wang | Zhuoran Yang | Weichen Wang | Jiequn Han
[1] P. Caines,et al. Individual and mass behaviour in large population stochastic wireless power control problems: centralized and Nash equilibrium solutions , 2003, 42nd IEEE International Conference on Decision and Control (IEEE Cat. No.03CH37475).
[2] P. Lions,et al. Jeux à champ moyen. I – Le cas stationnaire , 2006 .
[3] Jian Fang. The LQR Controller Design of Two-Wheeled Self-Balancing Robot Based on the Particle Swarm Optimization Algorithm , 2014 .
[4] Erfu Yang,et al. Multiagent Reinforcement Learning for Multi-Robot Systems: A Survey , 2004 .
[5] Jacob Engwerda,et al. LQ Dynamic Optimization and Differential Games , 2005 .
[6] Sham M. Kakade,et al. A Natural Policy Gradient , 2001, NIPS.
[7] Tamer Basar,et al. Markov-Nash equilibria in mean-field games with discounted cost , 2016, 2017 American Control Conference (ACC).
[8] P. Lions,et al. Jeux à champ moyen. II – Horizon fini et contrôle optimal , 2006 .
[9] Na Li,et al. Linear–Quadratic Mean-Field Game for Stochastic Delayed Systems , 2018, IEEE Transactions on Automatic Control.
[10] X. Zhou,et al. Continuous-Time Mean-Variance Portfolio Selection: A Stochastic LQ Framework , 2000 .
[11] Tamer Basar,et al. Policy Optimization Provably Converges to Nash Equilibria in Zero-Sum Linear Quadratic Games , 2019, NeurIPS.
[12] A. Bensoussan,et al. Mean Field Games and Mean Field Type Control Theory , 2013 .
[13] Joel Z. Leibo,et al. Multi-agent Reinforcement Learning in Sequential Social Dilemmas , 2017, AAMAS.
[14] Alessandro Lazaric,et al. Learning to cooperate in multi-agent social dilemmas , 2006, AAMAS '06.
[15] Romuald Elie,et al. On the Convergence of Model Free Learning in Mean Field Games , 2020, AAAI.
[16] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[17] Sean P. Meyn,et al. Learning in Mean-Field Games , 2014, IEEE Transactions on Automatic Control.
[18] David K. Smith,et al. Dynamic Programming and Optimal Control. Volume 1 , 1996 .
[19] Maxim Raginsky,et al. Approximate Nash Equilibria in Partially Observed Stochastic Games with Mean-Field Interactions , 2017, Math. Oper. Res..
[20] Mathieu Lauriere,et al. Linear-Quadratic Mean-Field Reinforcement Learning: Convergence of Policy Gradient Methods , 2019, ArXiv.
[21] Armin Zare,et al. Convergence and sample complexity of gradient methods for the model-free linear quadratic regulator problem , 2019, ArXiv.
[22] Michael Wooldridge,et al. Game Theory and Decision Theory in Multi-Agent Systems , 2002, Autonomous Agents and Multi-Agent Systems.
[23] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.
[24] Xun Li,et al. Discrete time mean-field stochastic linear-quadratic optimal control problems , 2013, Autom..
[25] David Silver,et al. Deep Reinforcement Learning from Self-Play in Imperfect-Information Games , 2016, ArXiv.
[26] Guy Lever,et al. Deterministic Policy Gradient Algorithms , 2014, ICML.
[27] Demis Hassabis,et al. Mastering the game of Go without human knowledge , 2017, Nature.
[28] Karl Henrik Johansson,et al. Stability analysis for multi-agent systems using the incidence matrix: Quantized communication and formation control , 2010, Autom..
[29] Xun Li,et al. Discrete-time mean-field Stochastic linear-quadratic optimal control problems, II: Infinite horizon case , 2015, Autom..
[30] Mark W. Schmidt,et al. Linear Convergence of Gradient and Proximal-Gradient Methods Under the Polyak-Łojasiewicz Condition , 2016, ECML/PKDD.
[31] Khashayar Khorasani,et al. Multi-agent team cooperation: A game theory approach , 2009, Autom..
[32] S. Liberty,et al. Linear Systems , 2010, Scientific Parallel Computing.
[33] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.
[34] Yongxin Chen,et al. Actor-Critic Provably Finds Nash Equilibria of Linear-Quadratic Mean-Field Games , 2019, ICLR.
[35] Pierre Cardaliaguet,et al. Learning in mean field games: The fictitious play , 2015, 1507.06280.
[36] Mathieu Lauriere,et al. Mean Field Control and Mean Field Game Models with Several Populations , 2018, 1810.00783.
[37] Alex Graves,et al. Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.
[38] Hamidou Tembine,et al. Robust Mean Field Games with Application to Production of an Exhaustible Resource , 2012, ROCOND.
[39] Marios M. Polycarpou,et al. Cooperative Control of Distributed Multi-Agent Systems , 2001 .
[40] Daniela Rus,et al. Multi-robot path planning for a swarm of robots that can both fly and drive , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).
[41] Peter E. Caines,et al. Large population stochastic dynamic games: closed-loop McKean-Vlasov systems and the Nash certainty equivalence principle , 2006, Commun. Inf. Syst..
[42] Riccardo Minciardi,et al. Optimal Control in a Cooperative Network of Smart Power Grids , 2012, IEEE Systems Journal.
[43] François Delarue,et al. Probabilistic Theory of Mean Field Games with Applications I: Mean Field FBSDEs, Control, and Games , 2018 .
[44] J. Willems. Least squares stationary optimal control and the algebraic Riccati equation , 1971 .
[45] Dimitri P. Bertsekas,et al. Dynamic Programming and Optimal Control, Two Volume Set , 1995 .
[46] P. Lions,et al. Mean field games , 2007 .
[47] Amnon Shashua,et al. Safe, Multi-Agent, Reinforcement Learning for Autonomous Driving , 2016, ArXiv.
[48] Lillian J. Ratliff,et al. Global Convergence of Policy Gradient for Sequential Zero-Sum Linear Quadratic Dynamic Games , 2019, ArXiv.
[49] Quanyan Zhu,et al. Risk-Sensitive Mean-Field Games , 2012, IEEE Transactions on Automatic Control.
[50] Pingjian Zhang,et al. Some Results On Two-Person Zero-Sum Linear Quadratic Differential Games , 2005, SIAM J. Control. Optim..
[51] Kevin Waugh,et al. DeepStack: Expert-level artificial intelligence in heads-up no-limit poker , 2017, Science.
[52] B. Anderson,et al. Optimal control: linear quadratic methods , 1990 .
[53] Sham M. Kakade,et al. Global Convergence of Policy Gradient Methods for the Linear Quadratic Regulator , 2018, ICML.
[54] Joel Z. Leibo,et al. Inequity aversion improves cooperation in intertemporal social dilemmas , 2018, NeurIPS.