A Linear Programming Approach to Nonstationary Infinite-Horizon Markov Decision Processes

Nonstationary infinite-horizon Markov decision processes (MDPs) generalize the most well-studied class of sequential decision models in operations research, namely, that of stationary MDPs, by relaxing the restrictive assumption that problem data do not change over time. Linear programming (LP) has been very successful in obtaining structural insights and devising solution methods for stationary MDPs. However, an LP approach for nonstationary MDPs is currently missing. This is because the LP formulation of a nonstationary infinite-horizon MDP includes countably infinite variables and constraints, and research on such infinite-dimensional LPs has traditionally faced several hurdles. For instance, duality results may not hold; an extreme point may not be a basic feasible solution; and in the context of a simplex algorithm, a pivot operation may require infinite data and computations, and a sequence of improving extreme points need not converge in value to optimal. In this paper, we tackle these challenges and establish (1) weak and strong duality, (2) complementary slackness, (3) a basic feasible solution characterization of extreme points, (4) a one-to-one correspondence between extreme points and deterministic Markovian policies, and (5) we devise a simplex algorithm for an infinite-dimensional LP formulation of nonstationary infinite-horizon MDPs. Pivots in this simplex algorithm use finite data, perform finite computations, and generate a sequence of improving extreme points that converges in value to optimal. Moreover, this sequence of extreme points gets arbitrarily close to the set of optimal extreme points. We also prove that decisions prescribed by these extreme points are eventually exactly optimal in all states of the nonstationary infinite-horizon MDP in early periods.

[1]  Daniel Adelman,et al.  Price-Directed Replenishment of Subsets: Methodology and Its Application to Inventory Routing , 2003, Manuf. Serv. Oper. Manag..

[2]  John N. Tsitsiklis,et al.  Introduction to linear optimization , 1997, Athena scientific optimization and computation series.

[3]  J. C. Bean,et al.  A Dynamic Infinite Horizon Replacement Economy Decision Model , 1984 .

[4]  Warren B. Powell,et al.  A Distributed Decision-Making Structure for Dynamic Resource Allocation Using Nonlinear Functional Approximations , 2005, Oper. Res..

[5]  Benjamin Van Roy,et al.  The Linear Programming Approach to Approximate Dynamic Programming , 2003, Oper. Res..

[6]  Suresh P. Sethi,et al.  Forecast, Solution, and Rolling Horizons in Operations Management Problems: A Classified Bibliography , 2001, Manuf. Serv. Oper. Manag..

[7]  Robert L. Smith,et al.  Solution and Forecast Horizons for Infinite-Horizon Nonhomogeneous Markov Decision Processes , 2007, Math. Oper. Res..

[8]  Robert L. Smith,et al.  Solving Nonstationary Infinite Horizon Dynamic Optimization Problems , 2000 .

[9]  O. Hernández-Lerma Adaptive Markov Control Processes , 1989 .

[10]  Yinyu Ye,et al.  A New Complexity Result on Solving the Markov Decision Problem , 2005, Math. Oper. Res..

[11]  Warren B. Powell,et al.  Approximate Dynamic Programming - Solving the Curses of Dimensionality , 2007 .

[12]  Huseyin Topaloglu,et al.  A duality‐based relaxation and decomposition approach for inventory distribution systems , 2008 .

[13]  A. Ghate Infinite horizon problems , 2011, Stochastic Dynamic Programming.

[14]  Robert L. Smith,et al.  A Shadow Simplex Method for Infinite Linear Programs , 2010, Oper. Res..

[15]  P. Tseng Solving H-horizon, stationary Markov decision problems in time proportional to log(H) , 1990 .

[16]  Huseyin Topaloglu,et al.  Using Lagrangian Relaxation to Compute Capacity-Dependent Bid Prices in Network Revenue Management , 2009, Oper. Res..

[17]  F. d'Epenoux,et al.  A Probabilistic Production and Inventory Problem , 1963 .

[18]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[19]  Daniel Adelman,et al.  A Price-Directed Approach to Stochastic Inventory/Routing , 2004, Oper. Res..

[20]  Robert L. Smith,et al.  Infinite Horizon Production Scheduling in Time - Varying Systems Under Stochastic Demand , 2004, Oper. Res..

[21]  C. Bes,et al.  Concepts of Forecast and Decision Horizons: Applications to Dynamic Stochastic Optimization Problems , 1986, Math. Oper. Res..

[22]  Wallace J. Hopp,et al.  Technical Note - Identifying Forecast Horizons in Nonhomogeneous Markov Decision Processes , 1989, Oper. Res..

[23]  Daniel Adelman,et al.  Dynamic Bid Prices in Revenue Management , 2007, Oper. Res..

[24]  George B. Dantzig,et al.  Linear programming and extensions , 1965 .

[25]  Warren B. Powell,et al.  Sensitivity Analysis of a Dynamic Fleet Management Model Using Approximate Dynamic Programming , 2007, Oper. Res..

[26]  Robert L. Smith,et al.  Infinite horizon production planning in time varying systems with convex production and inventory costs Robert L. Smith and Rachel Q. Zhang. , 1998 .

[27]  H. Edwin Romeijn,et al.  A simplex algorithm for minimum-cost network-flow problems in infinite networks , 2008, Networks.

[28]  O. Hernández-Lerma,et al.  A forecast horizon and a stopping rule for general Markov decision processes , 1988 .

[29]  Robert L. Smith,et al.  Duality in infinite dimensional linear programming , 1992, Math. Program..

[30]  U. Rieder,et al.  Markov Decision Processes , 2010 .

[31]  Robert L. Smith,et al.  Optimal Backlogging Over an Infinite Horizon Under Time-Varying Convex Production and Inventory Costs , 2009, Manuf. Serv. Oper. Manag..

[32]  Richard Grinold,et al.  Finite horizon approximations of infinite horizon linear programs , 1977, Math. Program..

[33]  Robert L. Smith,et al.  Shadow Prices in Infinite-Dimensional Linear Programming , 1998, Math. Oper. Res..

[34]  Robert L. Smith,et al.  Conditions for the discovery of solution horizons , 1993, Math. Program..

[35]  Onésimo Hernández-Lerma,et al.  The Linear Programming Approach , 2002 .

[36]  Daniel Adelman,et al.  Relaxations of Weakly Coupled Stochastic Dynamic Programs , 2008, Oper. Res..

[37]  Warren B. Powell,et al.  “Approximate dynamic programming: Solving the curses of dimensionality” by Warren B. Powell , 2007, Wiley Series in Probability and Statistics.

[38]  Robert L. Smith,et al.  Characterizing extreme points as basic feasible solutions in infinite linear programs , 2009, Oper. Res. Lett..

[39]  Robert L. Smith,et al.  Conditions for the Existence of Planning Horizons , 1984, Math. Oper. Res..

[40]  A. Federgruen,et al.  Fast Solution and Detection of Minimal Forecast Horizons in Dynamic Programs with a Single Indicator of the Future: Applications to Dynamic Lot-Sizing Models , 1995 .

[41]  Robert L. Smith,et al.  A New Optimality Criterion for Nonhomogeneous Markov Decision Processes , 1987, Oper. Res..

[42]  Dan Zhang,et al.  An Approximate Dynamic Programming Approach to Network Revenue Management with Customer Choice , 2009, Transp. Sci..

[43]  Yinyu Ye,et al.  The Simplex and Policy-Iteration Methods Are Strongly Polynomial for the Markov Decision Problem with a Fixed Discount Rate , 2011, Math. Oper. Res..

[44]  A. S. Manne Linear Programming and Sequential Decisions , 1960 .

[45]  E. Anderson,et al.  Linear programming in infinite-dimensional spaces : theory and applications , 1987 .

[46]  H. Edwin Romeijn,et al.  Extreme Point Solutions for Infinite Network Flow Problems ∗ , 2004 .

[47]  R. Bellman,et al.  Dynamic Programming and Markov Processes , 1960 .

[48]  Maurice Queyranne,et al.  Dynamic Multipriority Patient Scheduling for a Diagnostic Resource , 2008, Oper. Res..

[49]  Kim C. Border,et al.  Infinite Dimensional Analysis: A Hitchhiker’s Guide , 1994 .

[50]  Robert L. Smith,et al.  Finite dimensional approximation in infinite dimensional mathematical programming , 1992, Math. Program..