Reinforcement Learning in Stationary Mean-field Games
暂无分享,去创建一个
[1] H. Kushner,et al. Stochastic Approximation and Recursive Algorithms and Applications , 2003 .
[2] Hongyuan Zha,et al. Deep Mean Field Games for Learning Optimal Behavior Policy of Large Populations , 2017, ICLR 2018.
[3] Yan Ma,et al. Mean field stochastic games with binary actions: Stationary threshold policies , 2017, 2017 IEEE 56th Annual Conference on Decision and Control (CDC).
[4] Vijay R. Konda,et al. OnActor-Critic Algorithms , 2003, SIAM J. Control. Optim..
[5] R. Johari,et al. Equilibria of Dynamic Games with Many Players: Existence, Approximation, and Market Structure , 2011 .
[6] Matthew E. Taylor,et al. A survey and critique of multiagent deep reinforcement learning , 2018, Autonomous Agents and Multi-Agent Systems.
[7] Joseph Y. Halpern,et al. Multiagent learning in large anonymous games , 2009, AAMAS.
[8] Aditya Mahajan,et al. Team optimal control of coupled subsystems with mean-field sharing , 2014, 53rd IEEE Conference on Decision and Control.
[9] Michael L. Littman,et al. Markov Games as a Framework for Multi-Agent Reinforcement Learning , 1994, ICML.
[10] Shimon Whiteson,et al. Counterfactual Multi-Agent Policy Gradients , 2017, AAAI.
[11] Peter W. Glynn,et al. Likelihood ratio gradient estimation for stochastic systems , 1990, CACM.
[12] Yi Wu,et al. Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments , 2017, NIPS.
[13] Benjamin Van Roy,et al. MARKOV PERFECT INDUSTRY DYNAMICS WITH MANY FIRMS , 2008 .
[14] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[15] R. J. Williams,et al. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.
[16] Reuven Y. Rubinstein,et al. Sensitivity Analysis and Performance Extrapolation for Computer Simulation Models , 1989, Oper. Res..
[17] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[18] C. D. Meyer,et al. Sensitivity of the stationary distribution vector for an ergodic Markov chain , 1986 .
[19] Huang Minyi,et al. Mean field stochastic games: Monotone costs and threshold\\ policies , 2016 .
[20] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.
[21] Phillipp Meister,et al. Stochastic Recursive Algorithms For Optimization Simultaneous Perturbation Methods , 2016 .
[22] Peter E. Caines,et al. Mean Field Stochastic Adaptive Control , 2012, IEEE Transactions on Automatic Control.
[23] Ming Zhou,et al. Mean Field Multi-Agent Reinforcement Learning , 2018, ICML.
[24] S. Shankar Sastry,et al. On the Characterization of Local Nash Equilibria in Continuous Games , 2014, IEEE Transactions on Automatic Control.
[25] D. Bernhardt,et al. Anonymous sequential games: Existence and characterization of equilibria , 1995 .
[26] Minyi Huang,et al. Mean Field Stochastic Games with Binary Action Spaces and Monotone Costs , 2017, 1701.06661.
[27] Pierre Cardaliaguet,et al. Learning in mean field games: The fictitious play , 2015, 1507.06280.
[28] G. Rappl. On Linear Convergence of a Class of Random Search Algorithms , 1989 .
[29] David S. Leslie,et al. Reinforcement learning in games , 2004 .
[30] Benjamin Van Roy,et al. Computational Methods for Oblivious Equilibrium , 2010, Oper. Res..
[31] P. Weiss. L'hypothèse du champ moléculaire et la propriété ferromagnétique , 1907 .
[32] Jean C. Walrand,et al. How Bad Are Selfish Investments in Network Security? , 2011, IEEE/ACM Transactions on Networking.
[33] Sean P. Meyn,et al. Learning in Mean-Field Games , 2014, IEEE Transactions on Automatic Control.
[34] Peng Peng,et al. Multiagent Bidirectionally-Coordinated Nets: Emergence of Human-level Coordination in Learning to Play StarCraft Combat Games , 2017, 1703.10069.
[35] Peter E. Caines,et al. Large population stochastic dynamic games: closed-loop McKean-Vlasov systems and the Nash certainty equivalence principle , 2006, Commun. Inf. Syst..
[36] J. Spall. Multivariate stochastic approximation using a simultaneous perturbation gradient approximation , 1992 .
[37] Benjamin Van Roy,et al. Oblivious Equilibrium: A Mean Field Approximation for Large-Scale Dynamic Games , 2005, NIPS.
[38] Aditya Mahajan,et al. Renewal Monte Carlo: Renewal Theory Based Reinforcement Learning , 2018, 2018 IEEE Conference on Decision and Control (CDC).
[39] J. L. Maryak,et al. Global random optimization by simultaneous perturbation stochastic approximation , 2001, Proceedings of the 2001 American Control Conference. (Cat. No.01CH37148).
[40] Richard S. Sutton,et al. Introduction to Reinforcement Learning , 1998 .
[41] V. Borkar. Stochastic approximation with two time scales , 1997 .
[42] Peter L. Bartlett,et al. Infinite-Horizon Policy-Gradient Estimation , 2001, J. Artif. Intell. Res..
[43] Gerhard Neumann,et al. Deep Reinforcement Learning for Swarm Systems , 2018, J. Mach. Learn. Res..
[44] Karl Tuyls,et al. Evolutionary Dynamics of Multi-Agent Learning: A Survey , 2015, J. Artif. Intell. Res..
[45] Enrique Munoz de Cote,et al. Decentralised Learning in Systems with Many, Many Strategic Agents , 2018, AAAI.
[46] Michael P. Wellman,et al. Nash Q-Learning for General-Sum Stochastic Games , 2003, J. Mach. Learn. Res..
[47] V. Borkar. Stochastic Approximation: A Dynamical Systems Viewpoint , 2008 .
[48] R. Rosenthal,et al. Anonymous sequential games , 1988 .
[49] P. Lions,et al. Mean field games , 2007 .