Simulation-Based Approximate Policy Iteration with Generalized Logistic Functions

We present an approximate dynamic programming method based on simulation, policy iteration, a postdecision state formulation, and a logistic value function approximation. This method was developed as part of our efforts to determine whether nonlinear value function approximations could provide cost-effective policies for advance patient scheduling problems, and as a way of identifying the main advantages and disadvantages of using simulation versus linear programming to approximately solve dynamic capacity allocation problems. We first apply the proposed method to a queueing problem and then study a more practical application based on an advance multipriority patient scheduling problem. We investigate the quality and practical implications of the resulting appointment scheduling policies using simulation, and compare their performance to that of four other policies. Patient scheduling policies obtained by the new method not only depend on the number of appointments already booked on each day but also on the overall system workload. In particular, these policies provide lower discounted cost values and shorter average wait times for higher priority patients than policies directly obtained using linear programming and an affine value function approximation in the predecision state variables.

[1]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .

[2]  Dan Zhang,et al.  An Approximate Dynamic Programming Approach to Network Revenue Management with Customer Choice , 2009, Transp. Sci..

[3]  Warren B. Powell,et al.  Perspectives of approximate dynamic programming , 2012, Annals of Operations Research.

[4]  P. C. Schuur,et al.  Reinforcement learning versus heuristics for order acceptance on a single resource , 2007, J. Heuristics.

[5]  Warren B. Powell,et al.  An Adaptive Dynamic Programming Algorithm for Dynamic Fleet Management, I: Single Period Travel Times , 2002, Transp. Sci..

[6]  John N. Tsitsiklis,et al.  Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[7]  Matthew S. Maxwell,et al.  Tuning approximate dynamic programming policies for ambulance redeployment via direct search , 2013 .

[8]  Alborz Geramifard,et al.  A Tutorial on Linear Function Approximators for Dynamic Programming and Reinforcement Learning , 2013, Found. Trends Mach. Learn..

[9]  S. Mahadevan,et al.  Solving Semi-Markov Decision Problems Using Average Reward Reinforcement Learning , 1999 .

[10]  Warren B. Powell,et al.  THE DYNAMIC VEHICLE ALLOCATION PROBLEM WITH UNCERTAIN DEMANDS , 1987 .

[11]  Tapas K. Das,et al.  A reinforcement learning approach to a single leg airline revenue management problem with multiple fare classes and overbooking , 2002 .

[12]  Vivek F. Farias,et al.  A Smoothed Approximate Linear Program , 2009, NIPS.

[13]  Martin L. Puterman,et al.  Dynamic multi-appointment patient scheduling for radiation therapy , 2012, European Journal of Operational Research.

[14]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[15]  Martin L Puterman,et al.  Dynamic scheduling with due dates and time windows: an application to chemotherapy patient appointment booking , 2013, Health Care Management Science.

[16]  Benjamin Van Roy,et al.  The Linear Programming Approach to Approximate Dynamic Programming , 2003, Oper. Res..

[17]  Warren B. Powell,et al.  Approximate Dynamic Programming Captures Fleet Operations for Schneider National , 2010, Interfaces.

[18]  Matthew S. Maxwell,et al.  Approximate Dynamic Programming for Ambulance Redeployment , 2010, INFORMS J. Comput..

[19]  Daniel Adelman,et al.  A Price-Directed Approach to Stochastic Inventory/Routing , 2004, Oper. Res..

[20]  Daniel Adelman,et al.  Relaxations of Weakly Coupled Stochastic Dynamic Programs , 2008, Oper. Res..

[21]  Loo Hay Lee,et al.  An approximate dynamic programming approach for the empty container allocation problem , 2007 .

[22]  Warren B. Powell,et al.  “Approximate dynamic programming: Solving the curses of dimensionality” by Warren B. Powell , 2007, Wiley Series in Probability and Statistics.

[23]  Linos F. Frantzeskakis,et al.  A Successive Linear Approximation Procedure for Stochastic, Dynamic Vehicle Allocation Problems , 1990, Transp. Sci..

[24]  Dimitri P. Bertsekas,et al.  Dynamic Programming and Optimal Control, Two Volume Set , 1995 .

[25]  P. Schweitzer,et al.  Generalized polynomial approximations in Markovian decision processes , 1985 .

[26]  Rainer Kolisch,et al.  Approximate Dynamic Programming for Capacity Allocation in the Service Industry , 2010, Eur. J. Oper. Res..

[27]  Warren B. Powell,et al.  Approximate dynamic programming for high dimensional resource allocation problems , 2005 .

[28]  Verena Schmid,et al.  Solving the dynamic ambulance relocation and dispatching problem using approximate dynamic programming , 2012, Eur. J. Oper. Res..

[29]  Benjamin Van Roy,et al.  A neuro-dynamic programming approach to retailer inventory management , 1997, Proceedings of the 36th IEEE Conference on Decision and Control.

[30]  Huseyin Topaloglu,et al.  Computing protection level policies for dynamic capacity allocation problems by using stochastic approximation methods , 2009 .

[31]  Jonathan Patrick,et al.  A simulation based approximate dynamic programming approach to multi-class, multi-resource surgical scheduling , 2015, Eur. J. Oper. Res..

[32]  Diego Klabjan,et al.  Computing Near-Optimal Policies in Generalized Joint Replenishment , 2012, INFORMS J. Comput..

[33]  John N. Tsitsiklis,et al.  Call admission control and routing in integrated services networks using neuro-dynamic programming , 2000, IEEE Journal on Selected Areas in Communications.

[34]  Thomas G. Dietterich,et al.  High-Performance Job-Shop Scheduling With A Time-Delay TD(λ) Network , 1995, NIPS 1995.

[35]  Warren B. Powell,et al.  An Approximate Dynamic Programming Algorithm for Large-Scale Fleet Management: A Case Application , 2009, Transp. Sci..

[36]  Warren B. Powell,et al.  Robust policies for the transformer acquisition and allocation problem , 2010 .

[37]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[38]  Dimitri P. Bertsekas,et al.  Dynamic Programming and Optimal Control, Vol. II , 1976 .

[39]  Warren B. Powell,et al.  Approximate dynamic programming for management of high‐value spare parts , 2009, Journal of Manufacturing Technology Management.

[40]  Benjamin Van Roy,et al.  On Constraint Sampling in the Linear Programming Approach to Approximate Dynamic Programming , 2004, Math. Oper. Res..

[41]  Maurice Queyranne,et al.  Dynamic Multipriority Patient Scheduling for a Diagnostic Resource , 2008, Oper. Res..