E-commerce warehousing: learning a storage policy

E-commerce with major online retailers is changing the way people consume. The goal of increasing delivery speed while remaining cost-effective poses significant new challenges for supply chains as they race to satisfy the growing and fast-changing demand. In this paper, we consider a warehouse with a Robotic Mobile Fulfillment System (RMFS), in which a fleet of robots stores and retrieves shelves of items and brings them to human pickers. To adapt to changing demand, uncertainty, and differentiated service (e.g., prime vs. regular), one can dynamically modify the storage allocation of a shelf. The objective is to define a dynamic storage policy to minimise the average cycle time used by the robots to fulfil requests. We propose formulating this system as a Partially Observable Markov Decision Process, and using a Deep Q-learning agent from Reinforcement Learning, to learn an efficient real-time storage policy that leverages repeated experiences and insightful forecasts using simulations. Additionally, we develop a rollout strategy to enhance our method by leveraging more information available at a given time step. Using simulations to compare our method to traditional storage rules used in the industry showed preliminary results up to 14% better in terms of travelling times.

[1]  Nils Boysen,et al.  Parts-to-picker based order processing in a rack-moving mobile robots environment , 2017, Eur. J. Oper. Res..

[2]  Leena Suhl,et al.  Decision Rules for Robotic Mobile Fulfillment Systems , 2018, Operations Research Perspectives.

[3]  Rémi Coulom,et al.  Efficient Selectivity and Backup Operators in Monte-Carlo Tree Search , 2006, Computers and Games.

[4]  Demis Hassabis,et al.  Mastering the game of Go without human knowledge , 2017, Nature.

[5]  D. Bertsekas Reinforcement Learning and Optimal ControlA Selective Overview , 2018 .

[6]  John Enright,et al.  Optimization and Coordinated Autonomy in Mobile Fulfillment Systems , 2011, Automated Action Planning for Autonomous Mobile Robots.

[7]  Hoda Davarzani,et al.  Toward a relevant agenda for warehousing research: literature review and practitioners’ input , 2015, Logist. Res..

[8]  Marc Goetschalckx,et al.  Research on warehouse operation: A comprehensive review , 2007, Eur. J. Oper. Res..

[9]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[10]  Stephen C. Graves,et al.  Optimal Storage Assignment in Automatic Warehousing Systems , 1976 .

[11]  Hado van Hasselt,et al.  Double Q-learning , 2010, NIPS.

[12]  Debjit Roy,et al.  Robot-storage zone assignment strategies in mobile fulfillment systems , 2019, Transportation Research Part E: Logistics and Transportation Review.

[13]  Raffaello D'Andrea,et al.  Coordinating Hundreds of Cooperative, Autonomous Vehicles in Warehouses , 2007, AI Mag..

[14]  M. B. M. de Koster,et al.  Robotized Warehouse Systems: Developments and Research Opportunities , 2017 .

[15]  Nils Boysen,et al.  Warehousing in the e-commerce era: A survey , 2019, Eur. J. Oper. Res..

[16]  Nils Boysen,et al.  Storage Assignment with Rack-Moving Mobile Robots in KIVA Warehouses , 2018, Transp. Sci..

[17]  David Silver,et al.  Deep Reinforcement Learning with Double Q-Learning , 2015, AAAI.

[18]  Stephen C. Graves,et al.  Velocity‐Based Storage Assignment in Semi‐Automated Storage Systems , 2018 .

[19]  Yavuz A. Bozer,et al.  Travel-Time Models for Automated Storage/Retrieval Systems , 1984 .

[20]  Nils Boysen,et al.  Optimizing automated sorting in warehouses: The minimum order spread sequencing problem , 2018, Eur. J. Oper. Res..

[21]  Stephen C. Graves,et al.  Scheduling Policies for Automatic Warehousing Systems: Simulation Results , 1978 .

[22]  Michel Gendreau,et al.  Robotic mobile fulfillment systems: a mathematical modelling framework for e-commerce applications , 2021, Int. J. Prod. Res..

[23]  Lin Xie,et al.  RAWSim-O: A Simulation Framework for Robotic Mobile Fulfillment Systems , 2017, Logist. Res..

[24]  Byung Chun Park Order Picking: Issues, Systems and Models , 2012 .

[25]  Ben J. A. Kröse,et al.  Learning from delayed rewards , 1995, Robotics Auton. Syst..

[26]  R. Ramanathan The moderating roles of risk and efficiency on the relationship between logistics performance and customer loyalty in e-commerce , 2010 .

[27]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[28]  Tom Schaul,et al.  Rainbow: Combining Improvements in Deep Reinforcement Learning , 2017, AAAI.

[29]  Hande Yaman,et al.  Release Time Scheduling and Hub Location for Next-Day Delivery , 2012, Oper. Res..

[30]  Jeroen P. van den Berg,et al.  Simulation study of an automated storage/retrieval system , 2000 .

[31]  C.C. White,et al.  Dynamic programming and stochastic control , 1978, Proceedings of the IEEE.

[32]  R. D'Andrea,et al.  Future challenges of coordinating hundreds of autonomous vehicles in distribution facilities , 2008, 2008 IEEE International Conference on Technologies for Practical Robot Applications.

[33]  Debjit Roy,et al.  Estimating performance in a Robotic Mobile Fulfillment System , 2017, Eur. J. Oper. Res..

[34]  Stephen C. Graves,et al.  Storage-Retrieval Interleaving in Automatic Warehousing Systems , 1977 .

[35]  Nils Boysen,et al.  Manual order consolidation with put walls: the batched order bin sequencing problem , 2018, EURO J. Transp. Logist..

[36]  Kees Jan Roodbergen,et al.  A survey of literature on automated storage and retrieval systems , 2009, Eur. J. Oper. Res..

[37]  Angel Ruiz,et al.  On storage assignment policies for unit-load automated storage and retrieval systems , 2012 .