暂无分享,去创建一个
Michael I. Jordan | Zhuoran Yang | Zhaoran Wang | Han Zhong | Zhuoran Yang | Zhaoran Wang | Han Zhong
[1] Yuandong Tian,et al. Provably Efficient Policy Gradient Methods for Two-Player Zero-Sum Markov Games , 2021, ArXiv.
[2] Yifan Wu,et al. Behavior Regularized Offline Reinforcement Learning , 2019, ArXiv.
[3] Chi Jin,et al. Provably Efficient Exploration in Policy Optimization , 2019, ICML.
[4] Stefano Coniglio,et al. Methods for Finding Leader-Follower Equilibria with Multiple Followers: (Extended Abstract) , 2016, AAMAS.
[5] Meixia Tao,et al. Caching incentive design in wireless D2D networks: A Stackelberg game approach , 2016, 2016 IEEE International Conference on Communications (ICC).
[6] Michael P. Wellman,et al. Nash Q-Learning for General-Sum Stochastic Games , 2003, J. Mach. Learn. Res..
[7] Marcin Andrychowicz,et al. Solving Rubik's Cube with a Robot Hand , 2019, ArXiv.
[8] Pieter Abbeel,et al. Benchmarking Deep Reinforcement Learning for Continuous Control , 2016, ICML.
[9] Lin F. Yang,et al. Minimax Sample Complexity for Turn-based Stochastic Game , 2020, UAI.
[10] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.
[11] Zhuoran Yang,et al. Is Pessimism Provably Efficient for Offline RL? , 2020, ICML.
[12] Qinghua Liu,et al. A Sharp Analysis of Model-based Reinforcement Learning with Self-Play , 2020, ICML.
[13] Csaba Szepesvári,et al. Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path , 2006, Machine Learning.
[14] Michael I. Jordan,et al. Provably Efficient Reinforcement Learning with Linear Function Approximation , 2019, COLT.
[15] David C. Parkes,et al. The AI Economist: Improving Equality and Productivity with AI-Driven Tax Policies , 2020, ArXiv.
[16] Huan Wang,et al. Sample-Efficient Learning of Stackelberg Equilibria in General-Sum Games , 2021, NeurIPS.
[17] Zhuoran Yang,et al. A Theoretical Analysis of Deep Q-Learning , 2019, L4DC.
[18] Tiancheng Yu,et al. Provably Efficient Online Agnostic Learning in Markov Games , 2020, ArXiv.
[19] Thomas P. Hayes,et al. Stochastic Linear Optimization under Bandit Feedback , 2008, COLT.
[20] Quanquan Gu,et al. Almost Optimal Algorithms for Two-player Markov Games with Linear Function Approximation , 2021, ArXiv.
[21] Tamer Basar,et al. Provably Efficient Reinforcement Learning in Decentralized General-Sum Markov Games , 2022, Dynamic Games and Applications.
[22] Keith B. Hall,et al. Correlated Q-Learning , 2003, ICML.
[23] Stefano Coniglio,et al. Computing a Pessimistic Stackelberg Equilibrium with Multiple Followers: The Mixed-Pure Case , 2019, Algorithmica.
[24] Pingzhong Tang,et al. Learning Optimal Strategies to Commit To , 2019, AAAI.
[25] T. Başar,et al. Dynamic Noncooperative Game Theory , 1982 .
[26] Michail G. Lagoudakis,et al. Value Function Approximation in Zero-Sum Markov Games , 2002, UAI.
[27] Vincent Conitzer,et al. Computing the optimal strategy to commit to , 2006, EC '06.
[28] Stefano Coniglio,et al. Bilevel Programming Approaches to the Computation of Optimistic and Pessimistic Single-Leader-Multi-Follower Equilibria , 2017, SEA.
[29] Jose B. Cruz,et al. Stackelberg strategies and incentives in multiperson deterministic decision problems , 1984, IEEE Transactions on Systems, Man, and Cybernetics.
[30] Milind Tambe,et al. Security and Game Theory - Algorithms, Deployed Systems, Lessons Learned , 2011 .
[31] Sergey Levine,et al. Stabilizing Off-Policy Q-Learning via Bootstrapping Error Reduction , 2019, NeurIPS.
[32] Chi Jin,et al. Near-Optimal Reinforcement Learning with Self-Play , 2020, NeurIPS.
[33] Peter Bro Miltersen,et al. Strategy Iteration Is Strongly Polynomial for 2-Player Turn-Based Stochastic Games with a Constant Discount Factor , 2010, JACM.
[34] Fernando Ordóñez,et al. Stationary Strong Stackelberg Equilibrium in Discounted Stochastic Games , 2019, IEEE Transactions on Automatic Control.
[35] Chi Jin,et al. V-Learning - A Simple, Efficient, Decentralized Algorithm for Multiagent RL , 2021, ArXiv.
[36] Matthew E. Taylor,et al. A survey and critique of multiagent deep reinforcement learning , 2018, Autonomous Agents and Multi-Agent Systems.
[37] Chi Jin,et al. Provable Self-Play Algorithms for Competitive Reinforcement Learning , 2020, ICML.
[38] Y. Narahari,et al. Design of Incentive Compatible Mechanisms for Stackelberg Problems , 2005, WINE.
[39] Mengdi Wang,et al. Feature-Based Q-Learning for Two-Player Stochastic Games , 2019, ArXiv.
[40] Csaba Szepesvári,et al. Fitted Q-iteration in continuous action-space MDPs , 2007, NIPS.
[41] Agnieszka Wiszniewska-Matyszkiel,et al. Dynamic Stackelberg duopoly with sticky prices and a myopic follower , 2021, Operational Research.
[42] Peter Stone,et al. Reinforcement learning , 2019, Scholarpedia.
[43] Thorsten Joachims,et al. MOReL : Model-Based Offline Reinforcement Learning , 2020, NeurIPS.
[44] Akshay Krishnamurthy,et al. Reward-Free Exploration for Reinforcement Learning , 2020, ICML.
[45] Marc G. Bellemare,et al. The Importance of Pessimism in Fixed-Dataset Policy Optimization , 2020, ArXiv.
[46] Ming Jin,et al. Social game for building energy efficiency: Incentive design , 2014, 2014 52nd Annual Allerton Conference on Communication, Control, and Computing (Allerton).
[47] Banghua Zhu,et al. Bridging Offline Reinforcement Learning and Imitation Learning: A Tale of Pessimism , 2021, IEEE Transactions on Information Theory.
[48] Vikash Kumar,et al. A Game Theoretic Framework for Model Based Reinforcement Learning , 2020, ICML.
[49] Saeed Ghadimi,et al. Approximation Methods for Bilevel Programming , 2018, 1802.02246.
[50] Bernhard von Stengel,et al. Leadership games with convex strategy sets , 2010, Games Econ. Behav..
[51] Csaba Szepesvári,et al. Error Propagation for Approximate Policy and Value Iteration , 2010, NIPS.
[52] Afshin Oroojlooyjadid,et al. A review of cooperative multi-agent deep reinforcement learning , 2019, Applied Intelligence.
[53] Shie Mannor,et al. Optimistic Policy Optimization with Bandit Feedback , 2020, ICML.
[54] Chen-Yu Wei,et al. Online Reinforcement Learning in Stochastic Games , 2017, NIPS.
[55] Bart De Schutter,et al. A Comprehensive Survey of Multiagent Reinforcement Learning , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).
[56] Qiaomin Xie,et al. Learning Zero-Sum Simultaneous-Move Markov Games Using Function Approximation and Correlated Equilibrium , 2020, COLT 2020.
[57] Zhaoran Wang,et al. A Two-Timescale Framework for Bilevel Optimization: Complexity Analysis and Application to Actor-Critic , 2020, ArXiv.
[58] Nando de Freitas,et al. Critic Regularized Regression , 2020, NeurIPS.
[59] Fernando Ordóñez,et al. On the Value Iteration method for dynamic Strong Stackelberg Equilibria , 2019 .
[60] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[61] Mengdi Wang,et al. Model-Based Reinforcement Learning with Value-Targeted Regression , 2020, L4DC.
[62] Lillian J. Ratliff,et al. Adaptive Incentive Design , 2018, IEEE Transactions on Automatic Control.
[63] S. Levine,et al. Accelerating Online Reinforcement Learning with Offline Datasets , 2020, ArXiv.
[64] Csaba Szepesvári,et al. Improved Algorithms for Linear Stochastic Bandits , 2011, NIPS.
[65] A Single-Timescale Stochastic Bilevel Optimization Method , 2021, ArXiv.
[66] Eric van Damme,et al. Non-Cooperative Games , 2000 .
[67] Doina Precup,et al. Off-Policy Deep Reinforcement Learning without Exploration , 2018, ICML.
[68] Peter Dayan,et al. Q-learning , 1992, Machine Learning.
[69] Nan Jiang,et al. Information-Theoretic Considerations in Batch Reinforcement Learning , 2019, ICML.
[70] Natasha Jaques,et al. Way Off-Policy Batch Deep Reinforcement Learning of Implicit Human Preferences in Dialog , 2019, ArXiv.
[71] Lillian J. Ratliff,et al. Convergence of Learning Dynamics in Stackelberg Games , 2019, ArXiv.
[72] Demis Hassabis,et al. Mastering the game of Go without human knowledge , 2017, Nature.
[73] Mengdi Wang,et al. Sample-Optimal Parametric Q-Learning Using Linearly Additive Features , 2019, ICML.
[74] Quanquan Gu,et al. Nearly Minimax Optimal Reinforcement Learning for Linear Mixture Markov Decision Processes , 2020, COLT.
[75] Rémi Munos,et al. Minimax Regret Bounds for Reinforcement Learning , 2017, ICML.
[76] Vincent Conitzer,et al. Complexity of Mechanism Design , 2002, UAI.
[77] A. Haurie,et al. Sequential Stackelberg equilibria in two-person games , 1985 .
[78] Shie Mannor,et al. Regularized Policy Iteration with Nonparametric Function Spaces , 2016, J. Mach. Learn. Res..
[79] Nan Jiang,et al. $Q^\star$ Approximation Schemes for Batch Reinforcement Learning: A Theoretical Comparison , 2020, 2003.03924.
[80] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.
[81] Bruno Scherrer,et al. Approximate Dynamic Programming for Two-Player Zero-Sum Markov Games , 2015, ICML.
[82] J. Filar,et al. Competitive Markov Decision Processes , 1996 .
[83] Ariel D. Procaccia,et al. Learning Optimal Commitment to Overcome Insecurity , 2014, NIPS.
[84] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[85] Yongdong Wu,et al. Incentive Mechanism Design for Heterogeneous Peer-to-Peer Networks: A Stackelberg Game Approach , 2014, IEEE Transactions on Mobile Computing.
[86] Suresh P. Sethi,et al. A review of dynamic Stackelberg game models , 2016 .
[87] Noah Golowich,et al. Independent Policy Gradient Methods for Competitive Reinforcement Learning , 2021, NeurIPS.
[88] Matthew E. Taylor,et al. A survey and critique of multiagent deep reinforcement learning , 2019, Autonomous Agents and Multi-Agent Systems.
[89] Csaba Szepesvári,et al. Finite-Time Bounds for Fitted Value Iteration , 2008, J. Mach. Learn. Res..
[90] Pierre Baldi,et al. Solving the Rubik’s cube with deep reinforcement learning and search , 2019, Nature Machine Intelligence.
[91] Lin F. Yang,et al. Model-Based Reinforcement Learning with a Generative Model is Minimax Optimal , 2019, COLT 2020.
[92] Hilbert J. Kappen,et al. On the Sample Complexity of Reinforcement Learning with a Generative Model , 2012, ICML.