DROP: Deep relocating option policy for optimal ride-hailing vehicle repositioning

In a ride-hailing system, an optimal relocation of vacant vehicles can significantly reduce fleet idling time and balance the supply-demand distribution, enhancing system efficiency and promoting driver satisfaction and retention. Model-free deep reinforcement learning (DRL) has been shown to dynamically learn the relocating policy by actively interacting with the intrinsic dynamics in large-scale ride-hailing systems. However, the issues of sparse reward signals and unbalanced demand and supply distribution place critical barriers in developing effective DRL models. Conventional exploration strategy (e.g., the -greedy) may barely work under such an environment because of dithering in low-demand regions distant from high-revenue regions. This study proposes the deep relocating option policy (DROP) that supervises vehicle agents to escape from oversupply areas and effectively relocate to potentially underserved areas. We propose to learn the Laplacian embedding of a time-expanded relocation graph, as an approximation representation of the system relocation policy. The embedding generates task-agnostic signals, which in combination with task-dependent signals, constitute the pseudo-reward function for generating DROPs. We present a hierarchical learning framework that trains a highlevel relocation policy and a set of low-level DROPs. The effectiveness of our approach is demonstrated using a custombuilt high-fidelity simulator with real-world trip record data. We report that DROP significantly improves baseline models with 15.7% more hourly revenue and can effectively resolve the dithering issue in low-demand areas.

[1]  Zhe Xu,et al.  A Deep Value-network Based Approach for Multi-Driver Order Dispatching , 2019, KDD.

[2]  Yehuda Koren,et al.  On Spectral Graph Drawing , 2003, COCOON.

[3]  Sebastian Schmoll,et al.  Semi-Markov Reinforcement Learning for Stochastic Resource Collection , 2020, IJCAI.

[4]  Xuan Di,et al.  Reward Design for Driver Repositioning Using Multi-Agent Reinforcement Learning , 2020, ArXiv.

[5]  Dennis Luxen,et al.  Real-time routing with OpenStreetMap data , 2011, GIS.

[6]  Satinder Singh,et al.  Deep Reinforcement Learning for Multi-driver Vehicle Dispatching and Repositioning Problem , 2019, 2019 IEEE International Conference on Data Mining (ICDM).

[7]  Jun Wang,et al.  Efficient Ridesharing Order Dispatching with Mean Field Multi-Agent Reinforcement Learning , 2019, WWW.

[8]  Jieping Ye,et al.  CoRide: Joint Order Dispatching and Fleet Management for Multi-Scale Ride-Hailing Platforms , 2019, CIKM.

[9]  Marlos C. Machado,et al.  A Laplacian Framework for Option Discovery in Reinforcement Learning , 2017, ICML.

[10]  Weifeng Lv,et al.  Adaptive Dynamic Bipartite Graph Matching: A Reinforcement Learning Approach , 2019, 2019 IEEE 35th International Conference on Data Engineering (ICDE).

[11]  Emilio Frazzoli,et al.  Rebalancing the rebalancers: optimally routing vehicles and drivers in mobility-on-demand systems , 2013, 2013 American Control Conference.

[12]  Fan Zhang,et al.  Ride-Hailing Order Dispatching at DiDi via Reinforcement Learning , 2020, INFORMS J. Appl. Anal..

[13]  Sham M. Kakade,et al.  On the sample complexity of reinforcement learning. , 2003 .

[14]  Fan Zhang,et al.  Joint Charging and Relocation Recommendation for E-Taxi Drivers via Multi-Agent Mean Field Hierarchical Reinforcement Learning , 2022, IEEE Transactions on Mobile Computing.

[15]  Fan Zhang,et al.  Real-world Ride-hailing Vehicle Repositioning using Deep Reinforcement Learning , 2021, ArXiv.

[16]  Yifan Wu,et al.  The Laplacian in RL: Learning Representations with Efficient Approximations , 2018, ICLR.

[17]  Huayu Wu,et al.  Optimal rebalancing with waiting time constraints for a fleet of connected autonomous taxi , 2018, 2018 IEEE 4th World Forum on Internet of Things (WF-IoT).

[18]  Vaneet Aggarwal,et al.  DeepPool: Distributed Model-Free Algorithm for Ride-Sharing Using Deep Reinforcement Learning , 2019, IEEE Transactions on Intelligent Transportation Systems.

[19]  Zhe Xu,et al.  Large-Scale Order Dispatch in On-Demand Ride-Hailing Platforms: A Learning and Planning Approach , 2018, KDD.

[20]  Emilio Frazzoli,et al.  Robotic load balancing for mobility-on-demand systems , 2012, Int. J. Robotics Res..

[21]  Jieping Ye,et al.  Multi-Agent Reinforcement Learning for Order-dispatching via Order-Vehicle Distribution Matching , 2019, CIKM.

[22]  Vaneet Aggarwal,et al.  A Distributed Model-Free Ride-Sharing Approach for Joint Matching, Pricing, and Dispatching Using Deep Reinforcement Learning , 2020, IEEE Transactions on Intelligent Transportation Systems.

[23]  Marco Pavone,et al.  Control of robotic mobility-on-demand systems: A queueing-theoretical perspective , 2014, Int. J. Robotics Res..

[24]  Lihong Li,et al.  PAC-inspired Option Discovery in Lifelong Reinforcement Learning , 2014, ICML.

[25]  Jieping Ye,et al.  Reinforcement Learning for Ridesharing: A Survey , 2021, 2021 IEEE International Intelligent Transportation Systems Conference (ITSC).

[26]  Doina Precup,et al.  Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..