Efficient Collaborative Multi-Agent Deep Reinforcement Learning for Large-Scale Fleet Management.

Large-scale online ride-sharing platforms have substantially transformed our lives by reallocating transportation resources to alleviate traffic congestion and promote transportation efficiency. An efficient fleet management strategy not only can significantly improve the utilization of transportation resources but also increase the revenue and customer satisfaction. It is a challenging task to design an effective fleet management strategy that can adapt to an environment involving complex dynamics between demand and supply. Existing studies usually work on a simplified problem setting that can hardly capture the complicated stochastic demand-supply variations in high-dimensional space. In this paper we propose to tackle the large-scale fleet management problem using reinforcement learning, and propose a contextual multi-agent reinforcement learning framework including three concrete algorithms to achieve coordination among a large number of agents adaptive to different contexts. We show significant improvements of the proposed framework over state-of-the-art approaches through extensive empirical studies.

[1]  Olivier Buffet,et al.  Learning to Act in Decentralized Partially Observable MDPs , 2018, ICML.

[2]  Hoong Chuin Lau,et al.  Policy Gradient With Value Function Approximation For Collective Multiagent Planning , 2018, NIPS.

[3]  Shimon Whiteson,et al.  QMIX: Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning , 2018, ICML.

[4]  Ming Zhou,et al.  Mean Field Multi-Agent Reinforcement Learning , 2018, ICML.

[5]  Weinan Zhang,et al.  MAgent: A Many-Agent Reinforcement Learning Platform for Artificial Collective Intelligence , 2017, AAAI.

[6]  Demis Hassabis,et al.  Mastering the game of Go without human knowledge , 2017, Nature.

[7]  Giovanni De Magistris,et al.  OptLayer - Practical Constrained Optimization for Deep Reinforcement Learning in the Real World , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[8]  Marc Peter Deisenroth,et al.  Deep Reinforcement Learning: A Brief Survey , 2017, IEEE Signal Processing Magazine.

[9]  Joel Z. Leibo,et al.  Value-Decomposition Networks For Cooperative Multi-Agent Learning , 2017, ArXiv.

[10]  Yi Wu,et al.  Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments , 2017, NIPS.

[11]  Shimon Whiteson,et al.  Counterfactual Multi-Agent Policy Gradients , 2017, AAAI.

[12]  Hoong Chuin Lau,et al.  Collective Multiagent Sequential Decision Making Under Uncertainty , 2017, AAAI.

[13]  Yili Hong,et al.  How Do On‐demand Ridesharing Services Affect Traffic Congestion? The Moderating Role of Urban Compactness , 2016, Production and Operations Management.

[14]  Joelle Pineau,et al.  An Actor-Critic Algorithm for Sequence Prediction , 2016, ICLR.

[15]  J. Schulman,et al.  OpenAI Gym , 2016, ArXiv.

[16]  Jian Sun,et al.  Identity Mappings in Deep Residual Networks , 2016, ECCV.

[17]  Alex Graves,et al.  Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[18]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[19]  Dorian Kodelja,et al.  Multiagent cooperation and competition with deep reinforcement learning , 2015, PloS one.

[20]  Sepp Hochreiter,et al.  Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs) , 2015, ICLR.

[21]  Marc G. Bellemare,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[22]  Michal Maciejewski,et al.  The Influence of Multi-agent Cooperation on the Efficiency of Taxi Dispatching , 2013, PPAM.

[23]  Marc G. Bellemare,et al.  The Arcade Learning Environment: An Evaluation Platform for General Agents , 2012, J. Artif. Intell. Res..

[24]  Der-Horng Lee,et al.  A Collaborative Multiagent Taxi-Dispatch System , 2010, IEEE Transactions on Automation Science and Engineering.

[25]  Bart De Schutter,et al.  A Comprehensive Survey of Multiagent Reinforcement Learning , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[26]  Warren B. Powell,et al.  An Adaptive Dynamic Programming Algorithm for Dynamic Fleet Management, I: Single Period Travel Times , 2002, Transp. Sci..

[27]  Warren B. Powell,et al.  An Adaptive Dynamic Programming Algorithm for Dynamic Fleet Management, II: Multiperiod Travel Times , 2002, Transp. Sci..

[28]  M. Tan,et al.  Multi Agent Reinforcement Learning Independent vs Cooperative Agents , 2003 .

[29]  Teodor Gabriel Crainic,et al.  Survey Paper - A Review of Empty Flows and Fleet Management Models in Freight Transportation , 1987, Transp. Sci..

[30]  Chunfu Shao,et al.  Look-Ahead Insertion Policy for a Shared-Taxi System Based on Reinforcement Learning , 2018, IEEE Access.

[31]  Shimon Whiteson,et al.  Traffic Light Control by Multiagent Reinforcement Learning Systems , 2010, Interactive Collaborative Information Systems.