Near Optimality of Quantized Policies in Stochastic Control Under Weak Continuity Conditions

This paper studies the approximation of optimal control policies by quantized (discretized) policies for a very general class of Markov decision processes (MDPs). The problem is motivated by applications in networked control systems, computational methods for MDPs, and learning algorithms for MDPs. We consider the finite-action approximation of stationary policies for a discrete-time Markov decision process with discounted and average costs under a weak continuity assumption on the transition probability, which is a significant relaxation of conditions required in earlier literature. The discretization is constructive, and quantized policies are shown to approximate optimal deterministic stationary policies with arbitrary precision. The results are applied to the fully observed reduction of a partially observed Markov decision process, where weak continuity is a much more reasonable assumption than more stringent conditions such as strong continuity or continuity in total variation.

[1]  Dimitri P. Bertsekas,et al.  Stochastic optimal control : the discrete time case , 2007 .

[2]  Hans-Joachim Langen,et al.  Convergence of Dynamic Programming Models , 1981, Math. Oper. Res..

[3]  K. Parthasarathy,et al.  Probability measures on metric spaces , 1967 .

[4]  O. Hernández-Lerma,et al.  Recurrence conditions for Markov decision processes with Borel state space: A survey , 1991 .

[5]  D. White Finite-state approximations for denumerable-state infinite-horizon discounted Markov decision processes , 1980 .

[6]  Pravin Varaiya,et al.  Simulation-based Uniform Value Function Estimates of Markov Decision Processes , 2006, SIAM J. Control. Optim..

[7]  R. H. Liu Nearly optimal control of singularly perturbed Markov decision processes in discrete time , 2001 .

[8]  Tamás Linder,et al.  Asymptotic Optimality of Finite Approximations to Markov Decision Processes with General State and Action Spaces , 2015, ArXiv.

[9]  O. Hernández-Lerma,et al.  Markov chains and invariant probabilities , 2003 .

[10]  F. Dufour,et al.  Approximation of average cost Markov decision processes using empirical distributions and concentration inequalities , 2015 .

[11]  Steven I. Marcus,et al.  Structured solutions for stochastic control problems , 1992 .

[12]  D. Rhenius Incomplete Information in Markovian Decision Models , 1974 .

[13]  O. Hernández-Lerma,et al.  Further topics on discrete-time Markov control processes , 1999 .

[14]  A. Yushkevich Reduction of a Controlled Markov Model with Incomplete Data to a Problem with Complete Information in the Case of Borel State and Control Space , 1976 .

[15]  Onésimo Hernández-Lerma,et al.  Markov Control Processes , 1996 .

[16]  R. Cavazos-Cadena Finite-state approximations for denumerable state discounted markov decision processes , 1986 .

[17]  Tamer Basar,et al.  Stochastic Networked Control Systems: Stabilization and Optimization under Information Constraints , 2013 .

[18]  Michael Z. Zgurovsky,et al.  Partially Observable Total-Cost Markov Decision Processes with Weakly Continuous Transition Probabilities , 2014, Math. Oper. Res..

[19]  Eugene A. Feinberg,et al.  Average Cost Markov Decision Processes with Weakly Continuous Transition Probabilities , 2012, Math. Oper. Res..

[20]  Q. Zhang,et al.  NEARLY OPTIMAL CONTROL OF NONLINEAR MARKOVIAN SYSTEMS SUBJECT TO WEAK AND STRONG INTERACTIONS , 2001 .

[21]  François Dufour,et al.  Finite Linear Programming Approximations of Constrained Discounted Markov Decision Processes , 2013, SIAM J. Control. Optim..

[22]  Oscar Vega-Amaya,et al.  The average cost optimality equation: A fixed point approach , 2003 .

[23]  F. Dufour,et al.  Approximation of Markov decision processes with general state space , 2012 .

[24]  O. Hernández-Lerma Adaptive Markov Control Processes , 1989 .

[25]  O. Hernández-Lerma,et al.  Discrete-time Markov control processes , 1999 .

[26]  R. M. Dudley,et al.  Real Analysis and Probability , 1989 .

[27]  W. Fleming Book Review: Discrete-time Markov control processes: Basic optimality criteria , 1997 .

[28]  B. Krogh,et al.  State aggregation in Markov decision processes , 2002, Proceedings of the 41st IEEE Conference on Decision and Control, 2002..

[29]  Onésimo Hernández-Lerma,et al.  Average cost Markov control processes with weighted norms: existence of canonical policies , 1995 .

[30]  Tamás Linder,et al.  Asymptotic Optimality and Rates of Convergence of Quantized Stationary Policies in Stochastic Control , 2015, IEEE Transactions on Automatic Control.

[31]  Vivek S. Borkar,et al.  Convex Analytic Methods in Markov Decision Processes , 2002 .

[32]  László Györfi,et al.  Nonparametric Estimation of Conditional Distributions , 2007, IEEE Transactions on Information Theory.

[33]  D. Bertsekas Convergence of discretization procedures in dynamic programming , 1975 .