Deep Q-learning for same-day delivery with vehicles and drones

Abstract In this paper, we consider same-day delivery with vehicles and drones. Customers make delivery requests over the course of the day, and the dispatcher dynamically dispatches vehicles and drones to deliver the goods to customers before their delivery deadline. Vehicles can deliver multiple packages in one route but travel relatively slowly due to the urban traffic. Drones travel faster, but they have limited capacity and require charging or battery swaps. To exploit the different strengths of the fleets, we propose a deep Q-learning approach. Our method learns the value of assigning a new customer to either drones or vehicles as well as the option to not offer service at all. In a systematic computational analysis, we show the superiority of our policy compared to benchmark policies and the effectiveness of our deep Q-learning approach. We also show that the combination of state and action features is very valuable and that our policy can maintain effectiveness when demand data and the fleet size change moderately.

[1]  John-Paul Clarke,et al.  Same-Day Delivery with Drone Resupply , 2020, Transp. Sci..

[2]  Russell Bent,et al.  Scenario-Based Planning for Partially Dynamic Vehicle Routing with Stochastic Customers , 2004, Oper. Res..

[3]  David Fagerlund Comparison of machine learning algorithms for real-time vehicle selection in transport management , 2018 .

[4]  Panos M. Pardalos,et al.  Approximate dynamic programming: solving the curses of dimensionality , 2009, Optim. Methods Softw..

[5]  Christian Bettstetter,et al.  Job Selection in a Network of Autonomous UAVs for Delivery of Goods , 2016, Robotics: Science and Systems.

[6]  Warren B. Powell,et al.  Approximate Dynamic Programming: Solving the Curses of Dimensionality (Wiley Series in Probability and Statistics) , 2007 .

[7]  Iman Dayarian,et al.  Crowdshipping and Same‐day Delivery: Employing In‐store Customers to Deliver Online Orders , 2020, Production and Operations Management.

[8]  Dirk C. Mattfeld,et al.  Preemptive depot returns for dynamic same-day delivery , 2019, EURO J. Transp. Logist..

[9]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[10]  Alejandro Toriello,et al.  Request acceptance in same-day delivery , 2020 .

[11]  Alejandro Toriello,et al.  A Dynamic Traveling Salesman Problem with Stochastic Arc Costs , 2014, Oper. Res..

[12]  Barrett W. Thomas,et al.  The Same-Day Delivery Problem for Online Purchases , 2017, Transp. Sci..

[13]  Dirk C. Mattfeld,et al.  On modeling stochastic dynamic vehicle routing problems , 2020, EURO J. Transp. Logist..

[14]  Nils Boysen,et al.  Last-mile delivery concepts: a survey from an operational research perspective , 2020, OR Spectr..

[15]  Barrett W. Thomas,et al.  Meso-parametric value function approximation for dynamic customer acceptances in delivery routing , 2020, Eur. J. Oper. Res..

[16]  Lawrence V. Snyder,et al.  Reinforcement Learning for Solving the Vehicle Routing Problem , 2018, NeurIPS.

[17]  Alex Graves,et al.  Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[18]  Max Welling,et al.  Attention, Learn to Solve Routing Problems! , 2018, ICLR.

[19]  Marlin W. Ulmer,et al.  Same-Day delivery with pickup stations and autonomous vehicles , 2019, Comput. Oper. Res..

[20]  Alejandro Toriello,et al.  The One-Dimensional Dynamic Dispatch Waves Problem , 2016, Transp. Sci..

[21]  Hoong Chuin Lau,et al.  Deep Reinforcement Learning Approach to Solve Dynamic Vehicle Routing Problem with Stochastic Customers , 2020, ICAPS.

[22]  Warren B. Powell,et al.  A unified framework for stochastic optimization , 2019, Eur. J. Oper. Res..

[23]  Jean-Yves Potvin,et al.  Neural networks for automated vehicle dispatching , 1992, Comput. Oper. Res..

[24]  Navdeep Jaitly,et al.  Pointer Networks , 2015, NIPS.

[25]  Bruce L. Golden,et al.  Optimization approaches for civil applications of unmanned aerial vehicles (UAVs) or aerial drones: A survey , 2018, Networks.

[26]  Yu Qian,et al.  Can Sophisticated Dispatching Strategy Acquired by Reinforcement Learning? - A Case Study in Dynamic Courier Dispatching System , 2019, AAMAS.

[27]  Dirk C. Mattfeld,et al.  Budgeting Time for Dynamic Vehicle Routing with Stochastic Customer Requests , 2017, Transp. Sci..

[28]  Kevin A. Henry,et al.  A Nationwide Comparison of Driving Distance Versus Straight-Line Distance to Hospitals , 2012, The Professional geographer : the journal of the Association of American Geographers.

[29]  Barrett W. Thomas,et al.  Same‐day delivery with heterogeneous fleets of drones and vehicles , 2018, Networks.

[30]  Jacek Mańdziuk,et al.  New Shades of the Vehicle Routing Problem: Emerging Problem Formulations and Computational Intelligence Solution Methods , 2019, IEEE Transactions on Emerging Topics in Computational Intelligence.

[31]  Long-Ji Lin,et al.  Reinforcement learning for robots using neural networks , 1992 .

[32]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[33]  Martijn R.K. Mes,et al.  Anticipatory freight selection in intermodal long-haul round-trips , 2017 .

[34]  Yanchao Liu,et al.  An optimization-driven dynamic vehicle routing algorithm for on-demand meal delivery using drones , 2019, Comput. Oper. Res..

[35]  Alejandro Toriello,et al.  The Dynamic Dispatch Waves Problem for same-day delivery , 2018, Eur. J. Oper. Res..

[36]  Marlin W. Ulmer,et al.  Approximate Dynamic Programming for Dynamic Vehicle Routing , 2017 .

[37]  Michel Gendreau,et al.  A dynamic vehicle routing problem with multiple delivery routes , 2011, Annals of Operations Research.