Solving the Order Batching and Sequencing Problem using Deep Reinforcement Learning

In e-commerce markets, on time delivery is of great importance to customer satisfaction. In this paper, we present a Deep Reinforcement Learning (DRL) approach for deciding how and when orders should be batched and picked in a warehouse to minimize the number of tardy orders. In particular, the technique facilitates making decisions on whether an order should be picked individually (pick-by-order) or picked in a batch with other orders (pick-by-batch), and if so with which other orders. We approach the problem by formulating it as a semi-Markov decision process and develop a vector-based state representation that includes the characteristics of the warehouse system. This allows us to create a deep reinforcement learning solution that learns a strategy by interacting with the environment and solve the problem with a proximal policy optimization algorithm. We evaluate the performance of the proposed DRL approach by comparing it with several batching and sequencing heuristics in different problem settings. The results show that the DRL approach is able to develop a strategy that produces consistent, good solutions and performs better than the proposed heuristics.

[1]  Daniel Schubert,et al.  Order picking with multiple pickers and due dates - Simultaneous solution of Order Batching, Batch Assignment and Sequencing, and Picker Routing Problems , 2017, Eur. J. Oper. Res..

[2]  Lawrence V. Snyder,et al.  Reinforcement Learning for Solving the Vehicle Routing Problem , 2018, NeurIPS.

[3]  James J.H. Liou,et al.  Using a multiple-GA method to solve the batch picking problem: considering travel distance and order due time , 2008 .

[4]  Nils Boysen,et al.  Warehousing in the e-commerce era: A survey , 2019, Eur. J. Oper. Res..

[5]  Jianbin Li,et al.  Joint optimisation of order batching and picker routing in the online retailer’s warehouse in China , 2017, Int. J. Prod. Res..

[6]  Yingqian Zhang,et al.  Shunting Trains with Deep Reinforcement Learning , 2018, 2018 IEEE International Conference on Systems, Man, and Cybernetics (SMC).

[7]  Yingqian Zhang,et al.  Learning 2-opt Heuristics for the Traveling Salesman Problem via Deep Reinforcement Learning , 2020, ACML.

[8]  Abraham Duarte,et al.  GRASP with Variable Neighborhood Descent for the online order batching problem , 2020, J. Glob. Optim..

[9]  M. B. M. de Koster,et al.  Robotized Warehouse Systems: Developments and Research Opportunities , 2017 .

[10]  Sebastian Henn,et al.  Algorithms for on-line order batching in an order picking warehouse , 2012, Comput. Oper. Res..

[11]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[12]  Christoph H. Glock,et al.  Incorporating human factors in order picking planning models: framework and research opportunities , 2015 .

[13]  Samy Bengio,et al.  Neural Combinatorial Optimization with Reinforcement Learning , 2016, ICLR.

[14]  Mu-Chen Chen,et al.  An association-based clustering approach to order batching considering customer demand patterns , 2005 .

[15]  Alec Radford,et al.  Proximal Policy Optimization Algorithms , 2017, ArXiv.

[16]  Max Welling,et al.  Attention Solves Your TSP , 2018, ArXiv.

[17]  Jukka K. Nurminen,et al.  Practical Reinforcement Learning -Experiences in Lot Scheduling Application , 2019, IFAC-PapersOnLine.

[18]  Eduardo G. Pardo,et al.  An algorithm for batching, sequencing and picking operations in a warehouse , 2015, 2015 International Conference on Industrial Engineering and Systems Management (IESM).

[19]  Eduardo F. Morales,et al.  An Introduction to Reinforcement Learning , 2011 .

[20]  Abraham Duarte,et al.  General Variable Neighborhood Search for the Order Batching and Sequencing Problem , 2017, Eur. J. Oper. Res..

[21]  Kees Jan Roodbergen,et al.  Design and control of warehouse order picking: A literature review , 2006, Eur. J. Oper. Res..

[22]  Kris Braekers,et al.  Formulating and solving the integrated batching, routing, and picker scheduling problem in a real-life spare parts warehouse , 2019, Eur. J. Oper. Res..

[23]  Ali Serdar Tasan,et al.  Order batching operations: an overview of classification, solution techniques, and future research , 2019, J. Intell. Manuf..

[24]  Dennis J. Zhang,et al.  Can Deep Reinforcement Learning Improve Inventory Management? Performance on Dual Sourcing, Lost Sales and Multi-Echelon Problems , 2020, Manuf. Serv. Oper. Manag..

[25]  Ehsan Ardjmand,et al.  Minimizing order picking makespan with multiple pickers in a wave picking warehouse , 2018, International Journal of Production Economics.

[26]  Alex Graves,et al.  Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[27]  Moon-Kyu Lee,et al.  Scheduling of storage/retrieval orders under a just-in-time environment , 1995 .

[28]  Hongwei Wang,et al.  A heuristic based batching and assigning method for online customer orders , 2018 .

[29]  Alex Graves,et al.  Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[30]  Afshin Oroojlooyjadid,et al.  A Deep Q-Network for the Beer Game: A Deep Reinforcement Learning algorithm to Solve Inventory Optimization Problems. , 2017, 1708.05924.

[31]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[32]  Yingqian Zhang,et al.  A State Aggregation Approach for Solving Knapsack Problem with Deep Reinforcement Learning , 2020, ACML.

[33]  Charles G. Petersen An evaluation of order picking routeing policies , 1997 .

[34]  Alexander Binder,et al.  Evaluating the Visualization of What a Deep Neural Network Has Learned , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[35]  Roger W. Schmenner,et al.  An Evaluation of Routing and Volume‐based Storage Policies in an Order Picking Operation , 1999 .

[36]  Weiping Wang,et al.  Minimizing mean weighted tardiness in unrelated parallel machine scheduling with reinforcement learning , 2012, Comput. Oper. Res..

[37]  Sebastian Henn,et al.  Order batching and sequencing for the minimization of the total tardiness in picker-to-part warehouses , 2015 .

[38]  Elsayed A. Elsayed,et al.  Order processing in automated storage/retrieval systems with due dates , 1996 .

[39]  Alexandre Salles da Cunha,et al.  Optimally solving the joint order batching and picker routing problem , 2017, Eur. J. Oper. Res..

[40]  NOUD GADEMANN,et al.  Order batching to minimize total travel time in a parallel-aisle warehouse , 2005 .

[41]  Lenz Belzner,et al.  Deep reinforcement learning for semiconductor production scheduling , 2018, 2018 29th Annual SEMI Advanced Semiconductor Manufacturing Conference (ASMC).

[42]  Verena Schmid,et al.  Metaheuristics for order batching and sequencing in manual order picking systems , 2013, Comput. Ind. Eng..

[43]  Elsayed A. Elsayed,et al.  Sequencing and batching procedures for minimizing earliness and tardiness penalty of order retrievals , 1993 .