Minimising Regret in Route Choice

The use of reinforcement learning (RL) in multiagent scenarios is challenging. I consider the route choice problem, where drivers must choose routes that minimise their travel times. Here, selfish RL-agents must adapt to each others' decisions. In this work, I show how the agents can learn (with performance guarantees) by minimising the regret associated with their decisions, thus achieving the User Equilibrium (UE). Considering the UE is inefficient from a global perspective, I also focus on bridging the gap between the UE and the system optimum. In contrast to previous approaches, this work drops any full knowledge assumption.

[1]  Kagan Tumer,et al.  Unifying temporal and structural credit assignment problems , 2004, Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems, 2004. AAMAS 2004..

[2]  Kevin Waugh,et al.  Solving Games with Functional Regret Estimation , 2014, AAAI Workshop: Computer Poker and Imperfect Information.

[3]  Ana L. C. Bazzan,et al.  Learning to Minimise Regret in Route Choice , 2017, AAMAS.

[4]  Baruch Awerbuch,et al.  Adaptive routing with end-to-end feedback: distributed learning and geometric approaches , 2004, STOC '04.