Learning in Congestion Games with Bandit Feedback

In this paper, we investigate Nash-regret minimization in congestion games, a class of games with benign theoretical structure and broad real-world applications. We first propose a centralized algorithm based on the optimism in the face of uncertainty principle for congestion games with (semi-)bandit feedback, and obtain finite-sample guarantees. Then we propose a decentralized algorithm via a novel combination of the Frank-Wolfe method and G-optimal design. By exploiting the structure of the congestion game, we show the sample complexity of both algorithms depends only polynomially on the number of players and the number of facilities, but not the size of the action set, which can be exponentially large in terms of the number of facilities. We further define a new problem class, Markov congestion games, which allows us to model the non-stationarity in congestion games. We propose a centralized algorithm for Markov congestion games, whose sample complexity again has only polynomial dependence on all relevant problem parameters, but not the size of the action set.

[1]  Yuejie Chi,et al.  Independent Natural Policy Gradient Methods for Potential Games: Finite-time Global Convergence with Entropy Regularization , 2022, 2022 IEEE 61st Conference on Decision and Control (CDC).

[2]  K. Zhang,et al.  Independent Policy Gradient for Large-Scale Markov Potential Games: Sharper Rates, Function Approximation, and Game-Agnostic Convergence , 2022, ICML.

[3]  L. Ratliff,et al.  Improved Rates for Derivative Free Gradient Play in Strongly Monotone Games∗ , 2021, 2022 IEEE 61st Conference on Decision and Control (CDC).

[4]  Roy Fox,et al.  Independent Natural Policy Gradient Always Converges in Markov Potential Games , 2021, AISTATS.

[5]  Song Mei,et al.  When Can We Learn General-Sum Markov Games with a Large Number of Players Sample-Efficiently? , 2021, ICLR.

[6]  Chi Jin,et al.  V-Learning - A Simple, Efficient, Decentralized Algorithm for Multiagent RL , 2021, ArXiv.

[7]  Na Li,et al.  Gradient play in stochastic games: stationary points, convergence, and sample complexity , 2021, 2106.00198.

[8]  Qinghua Liu,et al.  A Sharp Analysis of Model-based Reinforcement Learning with Self-Play , 2020, ICML.

[9]  Oracle-Efficient Regret Minimization in Factored MDPs with Unknown Structure , 2020, 2009.05986.

[10]  Lihong Li,et al.  Efficient Reinforcement Learning in Factored MDPs with Application to Constrained RL , 2020, ICLR.

[11]  Suvrit Sra,et al.  Towards Minimax Optimal Reinforcement Learning in Factored Markov Decision Processes , 2020, NeurIPS.

[12]  Yun Kuen Cheung,et al.  Chaos, Extremism and Optimism: Volume Analysis of Learning in Games , 2020, NeurIPS.

[13]  Chi Jin,et al.  Provable Self-Play Algorithms for Competitive Reinforcement Learning , 2020, ICML.

[14]  Ambuj Tewari,et al.  Reinforcement Learning in Factored MDPs: Oracle-Efficient Algorithms and Tighter Regret Bounds for the Non-Episodic Setting , 2020, NeurIPS.

[15]  Amin Karbasi,et al.  One Sample Stochastic Frank-Wolfe , 2019, AISTATS.

[16]  Sebastian Bervoets,et al.  Learning with minimal information in continuous games , 2018, Theoretical Economics.

[17]  Francesco Orabona A Modern Introduction to Online Learning , 2019, ArXiv.

[18]  T. Başar,et al.  Multi-Agent Reinforcement Learning: A Selective Overview of Theories and Algorithms , 2019, Handbook of Reinforcement Learning and Control.

[19]  Anoop Cherian,et al.  Game Theoretic Optimization via Gradient-based Nikaido-Isoda Function , 2019, ICML.

[20]  David S. Leslie,et al.  Bandit learning in concave $N$-person games , 2018, 1810.01925.

[21]  Michael I. Jordan,et al.  Is Q-learning Provably Efficient? , 2018, NeurIPS.

[22]  Santiago Zazo,et al.  Learning Parametric Closed-Loop Policies for Markov Potential Games , 2018, ICLR.

[23]  Soummya Kar,et al.  On Best-Response Dynamics in Potential Games , 2017, SIAM J. Control. Optim..

[24]  Stéphane Durand,et al.  Analysis of Best Response Dynamics in Potential Games. (Analyse de la meilleure dynamique de réponse dans les jeux potentiels) , 2018 .

[25]  Johanne Cohen,et al.  Learning with Bandit Feedback in Potential Games , 2017, NIPS.

[26]  Andrew H. Kemp,et al.  Congestion Control for 6LoWPAN Networks: A Game Theoretic Framework , 2017, IEEE Internet of Things Journal.

[27]  Aviad Rubinstein,et al.  Settling the Complexity of Computing Approximate Two-Player Nash Equilibria , 2016, 2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS).

[28]  Po-An Chen,et al.  Generalized mirror descents in congestion games , 2016, Artif. Intell..

[29]  Valentin Goranko,et al.  The Game-Theoretic Framework , 2016 .

[30]  Po-An Chen,et al.  Playing Congestion Games with Bandit Feedbacks , 2015, AAMAS.

[31]  Pierre Coucheney,et al.  Penalty-Regulated Dynamics and Robust Learning Procedures in Games , 2013, Math. Oper. Res..

[32]  Alexandre M. Bayen,et al.  On the convergence of no-regret learning in selfish routing , 2014, ICML.

[33]  Constantinos Daskalakis,et al.  On the complexity of approximating a Nash equilibrium , 2011, SODA '11.

[34]  Christian Ibars,et al.  Distributed Demand Management in Smart Grid with a Congestion Game , 2010, 2010 First IEEE International Conference on Smart Grid Communications.

[35]  Roberto Cominetti,et al.  Author's Personal Copy Games and Economic Behavior a Payoff-based Learning Procedure and Its Application to Traffic Games , 2022 .

[36]  András Lörincz,et al.  Optimistic initialization and greediness lead to polynomial time learning in factored MDPs , 2009, ICML '09.

[37]  John N. Tsitsiklis,et al.  Efficiency loss in a network resource allocation game: the case of elastic supply , 2004, IEEE Transactions on Automatic Control.

[38]  Paul G. Spirakis,et al.  The structure and complexity of Nash equilibria for a selfish routing game , 2002, Theor. Comput. Sci..

[39]  L. Shapley,et al.  Stochastic Games* , 1953, Proceedings of the National Academy of Sciences.