Value Function is All You Need: A Unified Learning Framework for Ride Hailing Platforms

Large ride-hailing platforms, such as DiDi, Uber and Lyft, connect tens of thousands of vehicles in a city to millions of ride demands throughout the day, providing great promises for improving transportation efficiency through the tasks of order dispatching and vehicle repositioning. Existing studies, however, usually consider the two tasks in simplified settings that hardly address the complex interactions between the two, the real-time fluctuations between supply and demand, and the necessary coordinations due to the large-scale nature of the problem. In this paper we propose a unified value-based dynamic learning framework (V1D3) for tackling both tasks. At the center of the framework is a globally shared value function that is updated continuously using online experiences generated from real-time platform transactions. To improve the sample-efficiency and the robustness, we further propose a novel periodic ensemble method combining the fast online learning with a large-scale offline training scheme that leverages the abundant historical driver trajectory data. This allows the proposed framework to adapt quickly to the highly dynamic environment, to generalize robustly to recurrent patterns and to drive implicit coordinations among the population of managed vehicles. Extensive experiments based on real-world datasets show considerably improvements over other recently proposed methods on both tasks. Particularly, V1D3 outperforms the first prize winners of both dispatching and repositioning tracks in the KDD Cup 2020 RL competition, achieving state-of-the-art results on improving both total driver income and user experience related metrics.

[1]  J. Munkres ALGORITHMS FOR THE ASSIGNMENT AND TRANSIORTATION tROBLEMS* , 1957 .

[2]  Geoffrey E. Hinton,et al.  Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[3]  Zhe Xu,et al.  Large-Scale Order Dispatch in On-Demand Ride-Hailing Platforms: A Learning and Planning Approach , 2018, KDD.

[4]  Nicolas Vayatis,et al.  A review of change point detection methods , 2018, ArXiv.

[5]  Carlee Joe-Wong,et al.  MOVI: A Model-Free Approach to Dynamic Fleet Management , 2018, IEEE INFOCOM 2018 - IEEE Conference on Computer Communications.

[6]  Fan Zhang,et al.  Ride-Hailing Order Dispatching at DiDi via Reinforcement Learning , 2020, INFORMS J. Appl. Anal..

[7]  Dorian Kodelja,et al.  Multiagent cooperation and competition with deep reinforcement learning , 2015, PloS one.

[8]  Zhe Xu,et al.  A Deep Value-network Based Approach for Multi-Driver Order Dispatching , 2019, KDD.

[9]  Hua Zhang,et al.  Optimal Passenger-Seeking Policies on E-hailing Platforms Using Markov Decision Process and Imitation Learning , 2019 .

[10]  Jieping Ye,et al.  Deep Reinforcement Learning with Knowledge Transfer for Online Rides Order Dispatching , 2018, 2018 IEEE International Conference on Data Mining (ICDM).

[11]  David Silver,et al.  Deep Reinforcement Learning with Double Q-Learning , 2015, AAAI.

[12]  Amy R. Ward,et al.  Dynamic Matching for Real-Time Ride Sharing , 2020 .

[13]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[14]  Tommi S. Jaakkola,et al.  Convergence Results for Single-Step On-Policy Reinforcement-Learning Algorithms , 2000, Machine Learning.

[15]  Dawn B. Woodard,et al.  Dynamic pricing and matching in ride‐hailing platforms , 2019, Naval Research Logistics (NRL).

[16]  Yi Wu,et al.  Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments , 2017, NIPS.

[17]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[18]  Zhe Xu,et al.  Efficient Large-Scale Fleet Management via Multi-Agent Deep Reinforcement Learning , 2018, KDD.

[19]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[20]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[21]  Y. Nesterov A method for solving the convex programming problem with convergence rate O(1/k^2) , 1983 .