A linear programming approach to constrained nonstationary infinite-horizon Markov decision processes

Constrained Markov decision processes (MDPs) are MDPs optimizing an objective function while satisfying additional constraints. We study a class of infinite-horizon constrained MDPs with nonstationary problem data, finite state space, and discounted cost criterion. This problem can equivalently be formulated as a countably infinite linear program (CILP), i.e., a linear program (LP) with a countably infinite number of variables and constraints. Unlike finite LPs, CILPs can fail to satisfy useful theoretical properties such as duality, and to date there does not exist a general solution method for such problems. Specifically, the characterization of extreme points as basic feasible solutions in finite LPs does not extend to general CILPs. In this paper, we provide duality results and a complete characterization of extreme points of the CILP formulation of constrained nonstationary MDPs with finite state space, and illustrate the characterization for special cases. As a corollary, we obtain the existence of a K-randomized optimal policy, where K is the number of constraints.

[1]  Eugene A. Feinberg,et al.  Constrained Discounted Dynamic Programming , 1996, Math. Oper. Res..

[2]  Uriel G. Rothblum,et al.  Splitting Randomized Stationary Policies in Total-Reward Markov Decision Processes , 2012, Math. Oper. Res..

[3]  Aurel A. Lazar,et al.  Optimal flow control of a class of queueing networks in equilibrium , 1983 .

[4]  Eitan Altman,et al.  Denumerable Constrained Markov Decision Processes and Finite Approximations , 1994, Math. Oper. Res..

[5]  E. Frid On Optimal Strategies in Control Problems with Constraints , 1972 .

[6]  Robert L. Smith,et al.  Duality in infinite dimensional linear programming , 1992, Math. Program..

[7]  Daniel P. Heyman,et al.  Stochastic models in operations research , 1982 .

[8]  Kim C. Border,et al.  Infinite Dimensional Analysis: A Hitchhiker’s Guide , 1994 .

[9]  A. Hordijk,et al.  Constrained admission control to a queueing system , 1989, Advances in Applied Probability.

[10]  Keith W. Ross,et al.  Optimal priority assignment with hard constraint , 1986 .

[11]  Keith W. Ross,et al.  Randomized and Past-Dependent Policies for Markov Decision Processes with Multiple Constraints , 1989, Oper. Res..

[12]  E. Altman Constrained Markov Decision Processes , 1999 .

[13]  Keith W. Ross,et al.  Optimal scheduling of interactive and noninteractive traffic in telecommunication systems , 1988 .

[14]  J. Stoer,et al.  Convexity and Optimization in Finite Dimensions I , 1970 .

[15]  Robert L. Smith,et al.  A Linear Programming Approach to Nonstationary Infinite-Horizon Markov Decision Processes , 2013, Oper. Res..

[16]  Linn I. Sennott,et al.  Constrained Discounted Markov Decision Chains , 1991, Probability in the Engineering and Informational Sciences.

[17]  John S. Edwards,et al.  Linear Programming and Finite Markovian Control Problems , 1983 .

[18]  Kamal Golabi,et al.  A Statewide Pavement Management System , 1982 .

[19]  E. Anderson,et al.  Linear programming in infinite-dimensional spaces : theory and applications , 1987 .

[20]  Robert L. Smith,et al.  Characterizing extreme points as basic feasible solutions in infinite linear programs , 2009, Oper. Res. Lett..

[21]  Adam Shwartz,et al.  Optimal priority assignment: a time sharing approach , 1989 .