On Near Optimality of the Set of Finite-State Controllers for Average Cost POMDP
暂无分享,去创建一个
[1] Onésimo Hernández-Lerma,et al. Controlled Markov Processes , 1965 .
[2] S. Ross. Arbitrary State Markovian Decision Processes , 1968 .
[3] E. Fainberg. An $\varepsilon $-Optimal Control of a Finite Markov Chain with an Average Reward Criterion , 1980 .
[4] E. Fainberg. Non-Randomized Markov and Semi-Markov Strategies in Dynamic Programming , 1982 .
[5] E. Fainberg. Controlled Markov Processes with Arbitrary Numerical Criteria , 1983 .
[6] K.-J. Bierth. An expected average reward criterion , 1987 .
[7] Ari Arapostathis,et al. On the average cost optimality equation and the structure of optimal policies for partially observable Markov decision processes , 1991, Ann. Oper. Res..
[8] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[9] Michael I. Jordan,et al. Reinforcement Learning Algorithm for Partially Observable Markov Decision Problems , 1994, NIPS.
[10] L. Stettner,et al. Approximations of discrete time partially observed control problems , 1994 .
[11] Michael I. Jordan. Graphical Models , 1998 .
[12] Kee-Eung Kim,et al. Learning Finite-State Controllers for Partially Observable Environments , 1999, UAI.
[13] Peter L. Bartlett,et al. Infinite-Horizon Policy-Gradient Estimation , 2001, J. Artif. Intell. Res..
[14] Huizhen Yu,et al. A Function Approximation Approach to Estimation of Policy Gradient for POMDP with Structured Policies , 2005, UAI.
[15] D. Bertsekas,et al. Approximate solution methods for partially observable markov and semi-markov decision processes , 2006 .
[16] L. Platzman. Optimal Infinite-Horizon Undiscounted Control of Finite Probabilistic Systems , 2006 .
[17] Ari Arapostathis,et al. On the existence of stationary optimal policies for partially observed MDPs under the long-run average cost criterion , 2006, Syst. Control. Lett..
[18] Dimitri P. Bertsekas,et al. Stochastic optimal control : the discrete time case , 2007 .