Conditional Expectation based Value Decomposition for Scalable On-Demand Ride Pooling

Owing to the benefits for customers (lower prices), drivers (higher revenues), aggregation companies (higher revenues) and the environment (fewer vehicles), ondemand ride pooling (e.g., Uber pool, Grab Share) has become quite popular. The significant computational complexity of matching vehicles to combinations of requests has meant that traditional ride pooling approaches are myopic in that they do not consider the impact of current matches on future value for vehicles/drivers. Recently, Neural Approximate Dynamic Programming (NeurADP) has employed value decomposition with Approximate Dynamic Programming (ADP) to outperform leading approaches by considering the impact of an individual agent’s (vehicle) chosen actions on the future value of that agent. However, in order to ensure scalability and facilitate city-scale ride pooling, NeurADP completely ignores the impact of other agents actions on individual agent/vehicle value. As demonstrated in our experimental results, ignoring the impact of other agents actions on individual value can have a significant impact on the overall performance when there is increased competition among vehicles for demand. Our key contribution is a novel mechanism based on computing conditional expectations through joint conditional probabilities for capturing dependencies on other agents actions without increasing the complexity of training or decision making. We show that our new approach, Conditional Expectation based Value Decomposition (CEVD) outperforms NeurADP by up to 9.76% in terms of overall requests served, which is a significant improvement on a city wide benchmark taxi dataset.

[1]  Pradeep Varakantham,et al.  Neural Approximate Dynamic Programming for On-Demand Ride-Pooling , 2019, AAAI.

[2]  Jieping Ye,et al.  A Unified Approach to Route Planning for Shared Mobility , 2018, Proc. VLDB Endow..

[3]  Zhe Xu,et al.  Efficient Large-Scale Fleet Management via Multi-Agent Deep Reinforcement Learning , 2018, KDD.

[4]  Patrick Jaillet,et al.  ZAC: A Zone Path Construction Approach for Effective Real-Time Ridesharing , 2019, ICAPS.

[5]  Shimon Whiteson,et al.  QMIX: Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning , 2018, ICML.

[6]  Stuart J. Russell,et al.  Q-Decomposition for Reinforcement Learning Agents , 2003, ICML.

[7]  Shengyu Zhang,et al.  Algorithms for Trip-Vehicle Assignment in Ride-Sharing , 2018, AAAI.

[8]  Panos M. Pardalos,et al.  Approximate dynamic programming: solving the curses of dimensionality , 2009, Optim. Methods Softw..

[9]  Patrick Jaillet,et al.  Zone pAth Construction (ZAC) based Approaches for Effective Real-Time Ridesharing , 2020, J. Artif. Intell. Res..

[10]  Geoff Boeing,et al.  OSMnx: New Methods for Acquiring, Constructing, Analyzing, and Visualizing Complex Street Networks , 2016, Comput. Environ. Urban Syst..

[11]  Guy Lever,et al.  Value-Decomposition Networks For Cooperative Multi-Agent Learning Based On Team Reward , 2018, AAMAS.

[12]  Yu Zheng,et al.  T-share: A large-scale dynamic taxi ridesharing service , 2013, 2013 IEEE 29th International Conference on Data Engineering (ICDE).

[13]  Ruoming Jin,et al.  Large Scale Real-time Ridesharing with Service Guarantee on Road Networks , 2014, Proc. VLDB Endow..

[14]  Richard F. Hartl,et al.  A survey on pickup and delivery problems , 2008 .

[15]  Richard F. Hartl,et al.  A survey on dynamic and stochastic vehicle routing problems , 2016 .

[16]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[17]  Mykel J. Kochenderfer,et al.  Deep Implicit Coordination Graphs for Multi-agent Reinforcement Learning , 2020, AAMAS.

[18]  Carlos Guestrin,et al.  Multiagent Planning with Factored MDPs , 2001, NIPS.

[19]  Emilio Frazzoli,et al.  On-demand high-capacity ride-sharing via dynamic trip-vehicle assignment , 2017, Proceedings of the National Academy of Sciences.

[20]  Zhe Xu,et al.  Large-Scale Order Dispatch in On-Demand Ride-Hailing Platforms: A Learning and Planning Approach , 2018, KDD.

[21]  Jean-François Cordeau,et al.  Branch and Cut and Price for the Pickup and Delivery Problem with Time Windows , 2009, Transp. Sci..

[22]  Paolo Santi,et al.  Supporting Information for Quantifying the Benefits of Vehicle Pooling with Shareability Networks Data Set and Pre-processing , 2022 .