Learning to Minimize Cost to Serve for Multi-Node Multi-Product Order Fulfilment in Electronic Commerce

We describe a novel decision-making problem developed in response to the demands of retail electronic commerce (ecommerce). While working with logistics and retail industry business collaborators, we found that the cost of delivery of products from the most opportune node in the supply chain (a quantity called the cost-to-serve or CTS) is a key challenge. The large scale, high stochasticity, and large geographical spread of e-commerce supply chains make this setting ideal for a carefully designed data-driven decision-making algorithm. In this preliminary work, we focus on the specific subproblem of delivering multiple products in arbitrary quantities from any warehouse to multiple customers in each time period. We compare the relative performance and computational efficiency of several baselines, including heuristics and mixed-integer linear programming. We show that a reinforcement learning based algorithm is competitive with these policies, with the potential of efficient scale-up in the real world.

[1]  Javier García,et al.  A comprehensive survey on safe reinforcement learning , 2015, J. Mach. Learn. Res..

[2]  Michael Allen,et al.  Developing an OpenAI Gym-compatible framework and simulation environment for testing Deep Reinforcement Learning agents solving the Ambulance Location Problem , 2021, ArXiv.

[3]  Richard Meyes,et al.  Multi-Agent Reinforcement Learning for Job Shop Scheduling in Flexible Manufacturing Systems , 2019, 2019 Second International Conference on Artificial Intelligence for Industries (AI4I).

[4]  Hiroshi Ohta,et al.  A heuristic job-shop scheduling algorithm to minimize the total holding cost of completed and in-process products subject to no tardy jobs , 2006 .

[5]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[6]  Roberto Tadei,et al.  Multiperiod transshipment location–allocation problem with flow synchronization under stochastic handling operations , 2020 .

[7]  Puca Huachi Vaz Penna,et al.  An Iterated Local Search heuristic for the Heterogeneous Fleet Vehicle Routing Problem , 2013, J. Heuristics.

[8]  Nejib Ben-Khedher,et al.  The Multi-Item Joint Replenishment Problem with Transportation and Container Effects , 1994, Transp. Sci..

[9]  C. C. Holt,et al.  A Linear Decision Rule for Production and Employment Scheduling , 1955 .

[10]  Leon Cooper,et al.  Heuristic Methods for Location-Allocation Problems , 1964 .

[11]  Amin Asadi,et al.  A stochastic scheduling, allocation, and inventory replenishment problem for battery swap stations , 2021 .

[12]  R. Higgins,et al.  Inventory Policy and Trade Credit Financing , 1973 .

[13]  R. Kaplan,et al.  Measuring and Managing Customer Profitability , 2001 .

[14]  D. Lambert,et al.  Issues in Supply Chain Management , 2000 .

[15]  Lawrence V. Snyder,et al.  Reinforcement Learning for Solving the Vehicle Routing Problem , 2018, NeurIPS.

[16]  R. Guerreiro,et al.  Cost‐to‐serve measurement and customer profitability analysis , 2008 .

[17]  R. Kaplan,et al.  Cost & Effect: Using Integrated Cost Systems to Drive Profitability and Performance , 1997 .

[18]  Andrew Lim,et al.  A hybrid algorithm for time-dependent vehicle routing problem with time windows , 2021, Comput. Oper. Res..

[19]  Edmund Robinson,et al.  Effective heuristics for the dynamic demand joint replenishment problem , 2007, J. Oper. Res. Soc..

[20]  Harshad Khadilkar,et al.  Fast Approximate Solutions using Reinforcement Learning for Dynamic Capacitated Vehicle Routing with Time Windows , 2021, ArXiv.

[21]  Kenneth O. Stanley,et al.  Go-Explore: a New Approach for Hard-Exploration Problems , 2019, ArXiv.

[22]  A. Braithwaite,et al.  The Cost‐to‐Serve Method , 1998 .

[23]  Suresh P. Sethi,et al.  A survey on control theory applications to operational systems, supply chain management, and Industry 4.0 , 2018, Annu. Rev. Control..

[24]  David C. Parkes,et al.  Deep Learning for Multi-Facility Location Mechanism Design , 2018, IJCAI.

[25]  John T. Mentzer,et al.  The impact of e‐commerce on supply chain relationships , 2002 .

[26]  Nan Liu,et al.  Effects of e-commerce channel entry in a two-echelon supply chain: A comparative analysis of single- and dual-channel distribution systems , 2015 .

[27]  Khaled Ghédira,et al.  A hybrid evolutionary approach to job-shop scheduling with generic time lags , 2021, Journal of Scheduling.

[28]  Laurent Orseau,et al.  AI Safety Gridworlds , 2017, ArXiv.