Can Reinforcement Learning Find Stackelberg-Nash Equilibria in General-Sum Markov Games with Myopically Rational Followers?
暂无分享,去创建一个
[1] Chi Jin,et al. V-Learning - A Simple, Efficient, Decentralized Algorithm for Multiagent RL , 2021, ArXiv.
[2] T. Başar,et al. Provably Efficient Reinforcement Learning in Decentralized General-Sum Markov Games , 2021, Dynamic Games and Applications.
[3] Song Mei,et al. When Can We Learn General-Sum Markov Games with a Large Number of Players Sample-Efficiently? , 2021, ICLR.
[4] Agnieszka Wiszniewska-Matyszkiel,et al. Dynamic Stackelberg duopoly with sticky prices and a myopic follower , 2021, Operational Research.
[5] Stuart J. Russell,et al. Bridging Offline Reinforcement Learning and Imitation Learning: A Tale of Pessimism , 2021, IEEE Transactions on Information Theory.
[6] Huan Wang,et al. Sample-Efficient Learning of Stackelberg Equilibria in General-Sum Games , 2021, NeurIPS.
[7] Noah Golowich,et al. Independent Policy Gradient Methods for Competitive Reinforcement Learning , 2021, NeurIPS.
[8] Michael I. Jordan,et al. Provably Efficient Reinforcement Learning with Linear Function Approximation Under Adaptivity Constraints , 2021, NeurIPS.
[9] Zhuoran Yang,et al. Is Pessimism Provably Efficient for Offline RL? , 2020, ICML.
[10] Quanquan Gu,et al. Nearly Minimax Optimal Reinforcement Learning for Linear Mixture Markov Decision Processes , 2020, COLT.
[11] Lin F. Yang,et al. Minimax Sample Complexity for Turn-based Stochastic Game , 2020, UAI.
[12] Michael I. Jordan,et al. Bridging Exploration and General Function Approximation in Reinforcement Learning: Provably Efficient Kernel and Neural Value Iterations , 2020, ArXiv.
[13] Tiancheng Yu,et al. Provably Efficient Online Agnostic Learning in Markov Games , 2020, ArXiv.
[14] Qinghua Liu,et al. A Sharp Analysis of Model-based Reinforcement Learning with Self-Play , 2020, ICML.
[15] S. Du,et al. Is Reinforcement Learning More Difficult Than Bandits? A Near-optimal Algorithm Escaping the Curse of Horizon , 2020, COLT.
[16] Marc G. Bellemare,et al. The Importance of Pessimism in Fixed-Dataset Policy Optimization , 2020, ICLR.
[17] Emma Brunskill,et al. Provably Good Batch Reinforcement Learning Without Great Exploration , 2020, ArXiv.
[18] Lin F. Yang,et al. Model-Based Multi-Agent RL in Zero-Sum Markov Games with Near-Optimal Sample Complexity , 2020, NeurIPS.
[19] Zhaoran Wang,et al. A Two-Timescale Stochastic Algorithm Framework for Bilevel Optimization: Complexity Analysis and Application to Actor-Critic , 2020, SIAM J. Optim..
[20] Csaba Szepesvari,et al. Bandit Algorithms , 2020 .
[21] Nando de Freitas,et al. Critic Regularized Regression , 2020, NeurIPS.
[22] Chi Jin,et al. Near-Optimal Reinforcement Learning with Self-Play , 2020, NeurIPS.
[23] S. Levine,et al. Accelerating Online Reinforcement Learning with Offline Datasets , 2020, ArXiv.
[24] S. Levine,et al. Conservative Q-Learning for Offline Reinforcement Learning , 2020, NeurIPS.
[25] Mengdi Wang,et al. Model-Based Reinforcement Learning with Value-Targeted Regression , 2020, L4DC.
[26] Lantao Yu,et al. MOPO: Model-based Offline Policy Optimization , 2020, NeurIPS.
[27] T. Joachims,et al. MOReL : Model-Based Offline Reinforcement Learning , 2020, NeurIPS.
[28] David C. Parkes,et al. The AI Economist: Improving Equality and Productivity with AI-Driven Tax Policies , 2020, ArXiv.
[29] Xiangyang Ji,et al. Almost Optimal Model-Free Reinforcement Learning via Reference-Advantage Decomposition , 2020, NeurIPS.
[30] Vikash Kumar,et al. A Game Theoretic Framework for Model Based Reinforcement Learning , 2020, ICML.
[31] Nan Jiang,et al. $Q^\star$ Approximation Schemes for Batch Reinforcement Learning: A Theoretical Comparison , 2020, 2003.03924.
[32] Mykel J. Kochenderfer,et al. Learning Near Optimal Policies with Low Inherent Bellman Error , 2020, ICML.
[33] Shie Mannor,et al. Optimistic Policy Optimization with Bandit Feedback , 2020, ICML.
[34] Martin A. Riedmiller,et al. Keep Doing What Worked: Behavioral Modelling Priors for Offline Reinforcement Learning , 2020, ICLR.
[35] Zhuoran Yang,et al. Learning Zero-Sum Simultaneous-Move Markov Games Using Function Approximation and Correlated Equilibrium , 2020, COLT.
[36] Chi Jin,et al. Provable Self-Play Algorithms for Competitive Reinforcement Learning , 2020, ICML.
[37] Akshay Krishnamurthy,et al. Reward-Free Exploration for Reinforcement Learning , 2020, ICML.
[38] Chi Jin,et al. Provably Efficient Exploration in Policy Optimization , 2019, ICML.
[39] T. Başar,et al. Multi-Agent Reinforcement Learning: A Selective Overview of Theories and Algorithms , 2019, Handbook of Reinforcement Learning and Control.
[40] N. Gatti,et al. Computing a Pessimistic Stackelberg Equilibrium with Multiple Followers: The Mixed-Pure Case , 2019, Algorithmica.
[41] Alessandro Lazaric,et al. Frequentist Regret Bounds for Randomized Least-Squares Value Iteration , 2019, AISTATS.
[42] Marcin Andrychowicz,et al. Solving Rubik's Cube with a Robot Hand , 2019, ArXiv.
[43] Yifan Wu,et al. Behavior Regularized Offline Reinforcement Learning , 2019, ArXiv.
[44] Matthew E. Taylor,et al. A survey and critique of multiagent deep reinforcement learning , 2018, Autonomous Agents and Multi-Agent Systems.
[45] Afshin Oroojlooyjadid,et al. A review of cooperative multi-agent deep reinforcement learning , 2019, Applied Intelligence.
[46] Lin F. Yang,et al. Solving Discounted Stochastic Two-Player Games with Near-Optimal Time and Sample Complexity , 2019, AISTATS.
[47] Pingzhong Tang,et al. Learning Optimal Strategies to Commit To , 2019, AAAI.
[48] Pierre Baldi,et al. Solving the Rubik’s cube with deep reinforcement learning and search , 2019, Nature Machine Intelligence.
[49] Natasha Jaques,et al. Way Off-Policy Batch Deep Reinforcement Learning of Implicit Human Preferences in Dialog , 2019, ArXiv.
[50] Lin F. Yang,et al. Model-Based Reinforcement Learning with a Generative Model is Minimax Optimal , 2019, COLT.
[51] Lillian J. Ratliff,et al. Convergence of Learning Dynamics in Stackelberg Games , 2019, ArXiv.
[52] Sergey Levine,et al. Stabilizing Off-Policy Q-Learning via Bootstrapping Error Reduction , 2019, NeurIPS.
[53] Mengdi Wang,et al. Feature-Based Q-Learning for Two-Player Stochastic Games , 2019, ArXiv.
[54] Fernando Ordóñez,et al. Stationary Strong Stackelberg Equilibrium in Discounted Stochastic Games , 2019, IEEE Transactions on Automatic Control.
[55] Mengdi Wang,et al. Reinforcement Leaning in Feature Space: Matrix Bandit, Kernels, and Regret Bound , 2019, ICML.
[56] Nan Jiang,et al. Information-Theoretic Considerations in Batch Reinforcement Learning , 2019, ICML.
[57] Fernando Ordóñez,et al. On the Value Iteration method for dynamic Strong Stackelberg Equilibria , 2019 .
[58] Mengdi Wang,et al. Sample-Optimal Parametric Q-Learning Using Linearly Additive Features , 2019, ICML.
[59] Zhuoran Yang,et al. A Theoretical Analysis of Deep Q-Learning , 2019, L4DC.
[60] Emma Brunskill,et al. Tighter Problem-Dependent Regret Bounds in Reinforcement Learning without Domain Knowledge using Value Function Bounds , 2019, ICML.
[61] Doina Precup,et al. Off-Policy Deep Reinforcement Learning without Exploration , 2018, ICML.
[62] Demis Hassabis,et al. A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play , 2018, Science.
[63] Qiang Liu,et al. Breaking the Curse of Horizon: Infinite-Horizon Off-Policy Estimation , 2018, NeurIPS.
[64] Michael I. Jordan,et al. Is Q-learning Provably Efficient? , 2018, NeurIPS.
[65] Lillian J. Ratliff,et al. Adaptive Incentive Design , 2018, IEEE Transactions on Automatic Control.
[66] Saeed Ghadimi,et al. Approximation Methods for Bilevel Programming , 2018, 1802.02246.
[67] Romain Laroche,et al. Safe Policy Improvement with Baseline Bootstrapping , 2017, ICML.
[68] Chen-Yu Wei,et al. Online Reinforcement Learning in Stochastic Games , 2017, NIPS.
[69] Demis Hassabis,et al. Mastering the game of Go without human knowledge , 2017, Nature.
[70] Stefano Coniglio,et al. Bilevel Programming Approaches to the Computation of Optimistic and Pessimistic Single-Leader-Multi-Follower Equilibria , 2017, SEA.
[71] Rémi Munos,et al. Minimax Regret Bounds for Reinforcement Learning , 2017, ICML.
[72] Suresh P. Sethi,et al. A review of dynamic Stackelberg game models , 2016 .
[73] N. Gatti,et al. Methods for finding leader-follower equilibria with multiple followers , 2017, AAMAS 2017.
[74] Pieter Abbeel,et al. Benchmarking Deep Reinforcement Learning for Continuous Control , 2016, ICML.
[75] Meixia Tao,et al. Caching incentive design in wireless D2D networks: A Stackelberg game approach , 2016, 2016 IEEE International Conference on Communications (ICC).
[76] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.
[77] Bruno Scherrer,et al. Approximate Dynamic Programming for Two-Player Zero-Sum Markov Games , 2015, ICML.
[78] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[79] Ariel D. Procaccia,et al. Learning Optimal Commitment to Overcome Insecurity , 2014, NIPS.
[80] Ming Jin,et al. Social game for building energy efficiency: Incentive design , 2014, 2014 52nd Annual Allerton Conference on Communication, Control, and Computing (Allerton).
[81] Yongdong Wu,et al. Incentive Mechanism Design for Heterogeneous Peer-to-Peer Networks: A Stackelberg Game Approach , 2014, IEEE Transactions on Mobile Computing.
[82] Hilbert J. Kappen,et al. On the Sample Complexity of Reinforcement Learning with a Generative Model , 2012, ICML.
[83] Csaba Szepesvári,et al. Improved Algorithms for Linear Stochastic Bandits , 2011, NIPS.
[84] Milind Tambe,et al. Security and Game Theory - Algorithms, Deployed Systems, Lessons Learned , 2011 .
[85] Vincent Conitzer,et al. Stackelberg vs. Nash in Security Games: An Extended Investigation of Interchangeability, Equivalence, and Uniqueness , 2011, J. Artif. Intell. Res..
[86] Csaba Szepesvári,et al. Error Propagation for Approximate Policy and Value Iteration , 2010, NIPS.
[87] Peter Bro Miltersen,et al. Strategy Iteration Is Strongly Polynomial for 2-Player Turn-Based Stochastic Games with a Constant Discount Factor , 2010, JACM.
[88] Bernhard von Stengel,et al. Leadership games with convex strategy sets , 2010, Games Econ. Behav..
[89] Vincent Conitzer,et al. Learning and Approximating the Optimal Strategy to Commit To , 2009, SAGT.
[90] Csaba Szepesvári,et al. Finite-Time Bounds for Fitted Value Iteration , 2008, J. Mach. Learn. Res..
[91] Bart De Schutter,et al. A Comprehensive Survey of Multiagent Reinforcement Learning , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).
[92] Csaba Szepesvári,et al. Fitted Q-iteration in continuous action-space MDPs , 2007, NIPS.
[93] Csaba Szepesvári,et al. Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path , 2006, Machine Learning.
[94] Vincent Conitzer,et al. Computing the optimal strategy to commit to , 2006, EC '06.
[95] Y. Narahari,et al. Design of Incentive Compatible Mechanisms for Stackelberg Problems , 2005, WINE.
[96] Michael P. Wellman,et al. Nash Q-Learning for General-Sum Stochastic Games , 2003, J. Mach. Learn. Res..
[97] Keith B. Hall,et al. Correlated Q-Learning , 2003, ICML.
[98] Michail G. Lagoudakis,et al. Value Function Approximation in Zero-Sum Markov Games , 2002, UAI.
[99] Vincent Conitzer,et al. Complexity of Mechanism Design , 2002, UAI.
[100] Tim Roughgarden,et al. Stackelberg scheduling strategies , 2001, STOC '01.
[101] Michael L. Littman,et al. Friend-or-Foe Q-learning in General-Sum Games , 2001, ICML.
[102] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.
[103] J. Filar,et al. Competitive Markov Decision Processes , 1996 .
[104] Michael L. Littman,et al. Markov Games as a Framework for Multi-Agent Reinforcement Learning , 1994, ICML.
[105] Peter Dayan,et al. Q-learning , 1992, Machine Learning.
[106] A. Haurie,et al. Sequential Stackelberg equilibria in two-person games , 1985 .
[107] J. Cruz,et al. On the Stackelberg strategy in nonzero-sum games , 1973 .
[108] L. Shapley,et al. Stochastic Games* , 1953, Proceedings of the National Academy of Sciences.
[109] Quanquan Gu,et al. Almost Optimal Algorithms for Two-player Markov Games with Linear Function Approximation , 2021, ArXiv.
[110] W. Yin,et al. A Single-Timescale Stochastic Bilevel Optimization Method , 2021, ArXiv.
[111] Yuandong Tian,et al. Provably Efficient Policy Gradient Methods for Two-Player Zero-Sum Markov Games , 2021, ArXiv.
[112] Shie Mannor,et al. Regularized Policy Iteration with Nonparametric Function Spaces , 2016, J. Mach. Learn. Res..
[113] Matthieu Geist,et al. Approximate modified policy iteration and its application to the game of Tetris , 2015, J. Mach. Learn. Res..
[114] Dimitri P. Bertsekas,et al. Neuro-Dynamic Programming , 2009, Encyclopedia of Optimization.
[115] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[116] Eric van Damme,et al. Non-Cooperative Games , 2000 .
[117] Jose B. Cruz,et al. Stackelberg strategies and incentives in multiperson deterministic decision problems , 1984, IEEE Transactions on Systems, Man, and Cybernetics.