The policy iteration algorithm for average reward Markov decision processes with general state space
暂无分享,去创建一个
[1] Ronald A. Howard,et al. Dynamic Programming and Markov Processes , 1960 .
[2] Onésimo Hernández-Lerma,et al. Controlled Markov Processes , 1965 .
[3] C. Derman. DENUMERABLE STATE MARKOVIAN DECISION PROCESSES: AVERAGE COST CRITERION. , 1966 .
[4] Huibert Kwakernaak,et al. Linear Optimal Control Systems , 1972 .
[5] Arie Hordijk,et al. Dynamic programming and Markov potential theory , 1974 .
[6] E. Nummelin. General irreducible Markov chains and non-negative operators: List of symbols and notation , 1984 .
[7] L. Sennott. A new condition for the existence of optimal stationary policies in average cost Markov decision processes , 1986 .
[8] R. Dekker. Counter examples for compact action Markov decision chains with average reward criteria , 1987 .
[9] Martin L. Puterman,et al. On the Convergence of Policy Iteration in Finite State Undiscounted Markov Decision Processes: The Unichain Case , 1987, Math. Oper. Res..
[10] R. Weber,et al. Optimal control of service rates in networks of queues , 1987, Advances in Applied Probability.
[11] M. Kurano. LEARNING ALGORITHMS FOR MARKOV DECISION PROCESSES , 1987 .
[12] Linn I. Sennott,et al. Average Cost Optimal Stationary Policies in Infinite State Markov Decision Processes with Unbounded Costs , 1989, Oper. Res..
[13] P. Glynn. A Lyapunov Bound for Solutions of Poisson's Equation , 1989 .
[15] P. Whittle. Risk-Sensitive Optimal Control , 1990 .
[16] Marie Duflo. Méthodes récursives aléatoires , 1990 .
[17] E. Nummelin. On the Poisson equation in the potential theory of a single kernel. , 1991 .
[18] V. Borkar. Topics in controlled Markov chains , 1991 .
[19] O. Hernández-Lerma,et al. Recurrence conditions for Markov decision processes with Borel state space: A survey , 1991 .
[20] Linn I. Sennott,et al. Optimal Stationary Policies in General State Space Markov Decision Chains with Finite Action Sets , 1992, Math. Oper. Res..
[21] Sean P. Meyn,et al. Generalized Resolvents and Harris Recurrence of Markov Processes , 1992 .
[22] James Randolph Perkins. Control of push and pull manufacturing systems , 1993 .
[23] S. Meyn,et al. Stability of Markovian processes III: Foster–Lyapunov criteria for continuous-time processes , 1993, Advances in Applied Probability.
[24] Richard L. Tweedie,et al. Markov Chains and Stochastic Stability , 1993, Communications and Control Engineering Series.
[25] M. K. Ghosh,et al. Discrete-time controlled Markov processes with average cost criterion: a survey , 1993 .
[26] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[27] Sean P. Meyn,et al. Stability of Generalized Jackson Networks , 1994 .
[28] Sean P. Meyn,et al. Duality and linear programs for stability and performance analysis of queueing networks and scheduling policies , 1994, Proceedings of 1994 33rd IEEE Conference on Decision and Control.
[29] Sean P. Meyn. Transience of Multiclass Queueing Networks Via Fluid Limit Models , 1995 .
[30] Gideon Weiss,et al. On optimal draining of re-entrant fluid lines , 1995 .
[31] S. Meyn,et al. Exponential and Uniform Ergodicity of Markov Processes , 1995 .
[32] Florin Avram,et al. Fluid models of sequencing problems in open queueing networks; an optimal control approach , 1995 .
[33] J. Dai. On Positive Harris Recurrence of Multiclass Queueing Networks: A Unified Approach Via Fluid Limit Models , 1995 .
[34] Sean P. Meyn,et al. Stability and convergence of moments for multiclass queueing networks via fluid limit models , 1995, IEEE Trans. Autom. Control..
[35] Sunil Kumar,et al. Fluctuation smoothing policies are stable for stochastic re-entrant lines , 1996, Discret. Event Dyn. Syst..
[36] Gideon Weiss,et al. Stability and Instability of Fluid Models for Reentrant Lines , 1996, Math. Oper. Res..
[37] Sean P. Meyn,et al. Duality and linear programs for stability and performance analysis of queuing networks and scheduling policies , 1996, IEEE Trans. Autom. Control..
[38] Linn I. Sennott,et al. The convergence of value iteration in average cost Markov decision chains , 1996, Oper. Res. Lett..
[39] Sean P. Meyn,et al. Fluid Network Models: Linear Programs for Control and Performance Bounds , 1996 .
[40] Rolando Cavazos-Cadena,et al. Value iteration in a class of average controlled Markov chains with unbounded costs: necessary and sufficient conditions for pointwise convergence , 1996, Journal of Applied Probability.
[41] R. Cavazos-Cadena. Value Iteration in a Class of Communicating Markov Decision Chains with the Average Cost Criterion , 1996 .
[42] John N. Tsitsiklis,et al. Analysis of Temporal-Diffference Learning with Function Approximation , 1996, NIPS.
[43] Sean P. Meyn,et al. A Liapounov bound for solutions of the Poisson equation , 1996 .
[44] O. Hernández-Lerma,et al. Policy Iteration for Average Cost Markov Control Processes on Borel Spaces , 1997 .
[45] O. Hernández-Lerma,et al. Discrete-time Markov control processes , 1999 .
[46] Ann Appl,et al. On the Positive Harris Recurrence for Multiclass Queueing Networks: a Uniied Approach via Uid Limit Models , 1999 .
[47] Sean P. Meyn,et al. Value iteration and optimization of multiclass queueing networks , 1999, Queueing Syst. Theory Appl..