论文信息 - On Near Optimality of the Set of Finite-State Controllers for Average Cost POMDP

On Near Optimality of the Set of Finite-State Controllers for Average Cost POMDP

We consider the average cost problem for partially observable Markov decision processes (POMDP) with finite state, observation, and control spaces. We prove that there exists an e-optimal finite-state controller (FSC) functionally independent of initial distributions for any e > 0, under the assumption that the optimal liminf average cost function of the POMDP is constant. As part of our proof, we establish that if the optimal liminf average cost function is constant, then the optimal limsup average cost function is also constant, and the two are equal. We also discuss the connection between the existence of nearly optimal finite-history controllers and two other important issues for average cost POMDP: the existence of an average cost that is independent of the initial state distribution, and the existence of a bounded solution to the constant average cost optimality equation.

Dimitri P. Bertsekas | Huizhen Yu | D. Bertsekas | Huizhen Yu

[1] Onésimo Hernández-Lerma,et al. Controlled Markov Processes , 1965 .

[2] S. Ross. Arbitrary State Markovian Decision Processes , 1968 .

[3] E. Fainberg. An $\varepsilon $-Optimal Control of a Finite Markov Chain with an Average Reward Criterion , 1980 .

[4] E. Fainberg. Non-Randomized Markov and Semi-Markov Strategies in Dynamic Programming , 1982 .

[5] E. Fainberg. Controlled Markov Processes with Arbitrary Numerical Criteria , 1983 .

[6] K.-J. Bierth. An expected average reward criterion , 1987 .

[7] Ari Arapostathis,et al. On the average cost optimality equation and the structure of optimal policies for partially observable Markov decision processes , 1991, Ann. Oper. Res..

[8] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[9] Michael I. Jordan,et al. Reinforcement Learning Algorithm for Partially Observable Markov Decision Problems , 1994, NIPS.

[10] L. Stettner,et al. Approximations of discrete time partially observed control problems , 1994 .

[11] Michael I. Jordan. Graphical Models , 1998 .

[12] Kee-Eung Kim,et al. Learning Finite-State Controllers for Partially Observable Environments , 1999, UAI.

[13] Peter L. Bartlett,et al. Infinite-Horizon Policy-Gradient Estimation , 2001, J. Artif. Intell. Res..

[14] Huizhen Yu,et al. A Function Approximation Approach to Estimation of Policy Gradient for POMDP with Structured Policies , 2005, UAI.

[15] D. Bertsekas,et al. Approximate solution methods for partially observable markov and semi-markov decision processes , 2006 .

[16] L. Platzman. Optimal Infinite-Horizon Undiscounted Control of Finite Probabilistic Systems , 2006 .

[17] Ari Arapostathis,et al. On the existence of stationary optimal policies for partially observed MDPs under the long-run average cost criterion , 2006, Syst. Control. Lett..

[18] Dimitri P. Bertsekas,et al. Stochastic optimal control : the discrete time case , 2007 .