论文信息 - An Online Learning Approach to a Multi-player N-armed Functional Bandit

An Online Learning Approach to a Multi-player N-armed Functional Bandit

Congestion games possess the property of emitting at least one pure Nash equilibrium and have a rich history of practical use in transport modelling. In this paper we approach the problem of modelling equilibrium within congestion games using a decentralised multi-player probabilistic approach via stochastic bandit feedback. Restricting the strategies available to players under the assumption of bounded rationality, we explore an online multiplayer exponential weights algorithm for unweighted atomic routing games and compare this with a \(\epsilon \)-greedy algorithm.

[1] Johanne Cohen,et al. Learning with Bandit Feedback in Potential Games , 2017, NIPS.

[2] Alexandre M. Bayen,et al. Benchmarks for reinforcement learning in mixed-autonomy traffic , 2018, CoRL.

[3] R. Selten,et al. Bounded rationality: The adaptive toolbox , 2000 .

[4] Luca Sanguinetti,et al. Online convex optimization and no-regret learning: Algorithms, guarantees and applications , 2018, ArXiv.

[5] Michael Patriksson,et al. The Traffic Assignment Problem: Models and Methods , 2015 .

[6] T. Roughgarden. Algorithmic Game Theory: Routing Games , 2007 .

[7] R. Rosenthal. A class of games possessing pure-strategy Nash equilibria , 1973 .