Solution and Forecast Horizons for Infinite-Horizon Nonhomogeneous Markov Decision Processes

We consider a nonhomogeneous infinite-horizon Markov Decision Process (MDP) problem with multiple optimal first-period policies. We seek an algorithm that, given finite data, delivers an optimal first-period policy. Such an algorithm can thus recursively generate, within a rolling-horizon procedure, an infinite-horizon optimal solution to the original problem. However, it can happen that no such algorithm exists, i.e., the MDP is not well posed. Equivalently, it is impossible to solve the problem with a finite amount of data. Assuming increasing marginal returns in actions (with respect to states) and stochastically increasing state transitions (with respect to actions), we provide an algorithm that is guaranteed to solve the given MDP whenever it is well posed. This algorithm determines, in finite time, a forecast horizon for which an optimal solution delivers an optimal first-period policy. As an application, we solve all well-posed instances of the time-varying version of the classic asset-selling problem.

[1]  James R. Munkres,et al.  Topology; a first course , 1974 .

[2]  Matthew J. Sobel,et al.  Myopic Solutions of Markov Decision Processes and Stochastic Games , 1981, Oper. Res..

[3]  Daniel P. Heyman,et al.  Stochastic models in operations research , 1982 .

[4]  Robert L. Smith,et al.  Conditions for the Existence of Planning Horizons , 1984, Math. Oper. Res..

[5]  Suresh P. Sethi,et al.  Conditions for the Existence of Decision Horizons for Discounted Problems in a Stochastic Environment: A Note , 1985 .

[6]  Charles R. Johnson,et al.  Matrix analysis , 1985, Statistical Inference for Engineers and Data Scientists.

[7]  J. Lasserre,et al.  An on-line procedure in discounted infinite-horizon stochastic optimal control , 1986 .

[8]  Chung-Yee Lee,et al.  Rolling Planning Horizons: Error Bounds for the Dynamic Lot Size Model , 1986, Math. Oper. Res..

[9]  Robert L. Smith,et al.  A New Optimality Criterion for Nonhomogeneous Markov Decision Processes , 1987, Oper. Res..

[10]  Robert L. Smith,et al.  Aggregation in Dynamic Programming , 1987, Oper. Res..

[11]  C. Bes,et al.  Concepts of Forecast and Decision Horizons: Applications to Dynamic Stochastic Optimization Problems , 1986, Math. Oper. Res..

[12]  O. Hernández-Lerma,et al.  A forecast horizon and a stopping rule for general Markov decision processes , 1988 .

[13]  Wallace J. Hopp,et al.  Technical Note - Identifying Forecast Horizons in Nonhomogeneous Markov Decision Processes , 1989, Oper. Res..

[14]  Sarah M. Ryan,et al.  Degeneracy in infinite horizon optimization , 1989, Math. Program..

[15]  O. Hernández-Lerma Adaptive Markov Control Processes , 1989 .

[16]  J. C. Bean,et al.  Denumerable state nonhomogeneous Markov decision processes , 1990 .

[17]  Raymond L. Smith,et al.  Rolling Horizon Procedures in Nonhomogeneous Markov Decision Processes , 1992, Oper. Res..

[18]  Robert L. Smith,et al.  A Tie-Breaking Rule for Discrete Infinite Horizon Optimization , 1992, Oper. Res..

[19]  Robert L. Smith,et al.  Finite dimensional approximation in infinite dimensional mathematical programming , 1992, Math. Program..

[20]  Robert L. Smith,et al.  Conditions for the discovery of solution horizons , 1993, Math. Program..

[21]  Erik R. Altman,et al.  On submodular value functions of dynamic programming , 1995 .

[22]  Robert L. Smith,et al.  Infinite horizon production planning in time varying systems with convex production and inventory costs Robert L. Smith and Rachel Q. Zhang. , 1998 .

[23]  D. M. Topkis Supermodularity and Complementarity , 1998 .

[24]  Robert L. Smith,et al.  Solving Nonstationary Infinite Horizon Dynamic Optimization Problems , 2000 .

[25]  Suresh P. Sethi,et al.  Forecast, Solution, and Rolling Horizons in Operations Management Problems: A Classified Bibliography , 2001, Manuf. Serv. Oper. Manag..

[26]  Robert L. Smith,et al.  A paradox in equipment replacement under technological improvement , 2003, Oper. Res. Lett..

[27]  Robert L. Smith,et al.  Infinite Horizon Production Scheduling in Time - Varying Systems Under Stochastic Demand , 2004, Oper. Res..

[28]  Raúl Montes-de-Oca,et al.  Conditions for the uniqueness of optimal policies of discounted Markov decision processes , 2004, Math. Methods Oper. Res..