Multi-time scale Markov decision process approach to strategic network growth of reverse supply chains

This paper addresses a complex set of decisions that surround the growth over time of reverse supply chain networks that collect used products for reuse, refurbishment, and/or recycling by processors. The collection network growth problem is decomposed into strategic, tactical and operational problems. This paper focuses on the strategic problem which is to determine how to allocate capital budget resource effectively to grow the network to meet long term collection targets and collection cost constraints. We model the strategic problem as a Markov decision process which can also be posed as multi-time scale Markov decision problem. The recruitment problem in a tactical level appears as a sub-problem for the strategic model. Using dynamic programming, linear programming and Q-Learning approaches, an heuristic is implemented to solve realistically sized problems. A numerical study demonstrates that the heuristic can obtain a good solution for the large-scale problem in reasonable time which is not possible when trying to obtain the optimal solution with the exact DP approach.

[1]  P. Fiala Information sharing in supply chains , 2005 .

[2]  O. Hernández-Lerma Adaptive Markov Control Processes , 1989 .

[3]  Jan Vondrák,et al.  Approximating the stochastic knapsack problem: the benefit of adaptivity , 2004, 45th Annual IEEE Symposium on Foundations of Computer Science.

[4]  A. Ruiz-Torres,et al.  The optimal number of suppliers considering the costs of individual supplier failures , 2007 .

[5]  S. Bhatnagar,et al.  Hierarchical decision making in semiconductor fabs using multi-time scale Markov decision processes , 2004, 2004 43rd IEEE Conference on Decision and Control (CDC) (IEEE Cat. No.04CH37601).

[6]  Weijun Xia,et al.  Supplier selection with multiple criteria in volume discount environments , 2007 .

[7]  F. Chan,et al.  Global supplier development considering risk factors using fuzzy extended AHP-based approach , 2007 .

[8]  Rommert Dekker,et al.  A stochastic approach to a case study for product recovery network design , 2005, Eur. J. Oper. Res..

[9]  Mark A. Shayman,et al.  Multitime scale Markov decision processes , 2003, IEEE Trans. Autom. Control..

[10]  John R. Birge,et al.  Introduction to Stochastic Programming , 1997 .

[11]  Vivek S. Borkar,et al.  An actor-critic algorithm for constrained Markov decision processes , 2005, Syst. Control. Lett..

[12]  Abhijit Gosavi,et al.  Simulation-Based Optimization: Parametric Optimization Techniques and Reinforcement Learning , 2003 .

[13]  L. V. Wassenhove,et al.  MANAGING PRODUCT RETURNS FOR REMANUFACTURING , 2001 .

[14]  Willy Herroelen,et al.  A hierarchical approach to multi-project planning under uncertainty , 2004 .

[15]  S. Joshi,et al.  Set-up games: a heuristic game-theoretic approach to set-up decisions for heterarchical manufacturing systems , 2004 .

[16]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[17]  Samir K. Srivastava,et al.  Network design for reverse logistics , 2008 .

[18]  Shalabh Bhatnagar,et al.  Actor-critic algorithms for hierarchical Markov decision processes , 2006, Autom..

[19]  Sanjay B. Joshi,et al.  A structured adaptive supervisory control methodology for modeling the control of a discrete event manufacturing system , 1999, IEEE Trans. Syst. Man Cybern. Part A.

[20]  Kihong Park,et al.  Multiple Time Scale Congestion Control for Self-Similar Network Traffic , 1999, Perform. Evaluation.

[21]  Gabriel R. Bitran,et al.  Production Planning of Style Goods with High Setup Costs and Forecast Revisions , 1986, Oper. Res..

[22]  Marshall L. Fisher,et al.  Supply Chain Inventory Management and the Value of Shared Information , 2000 .