An improved algorithm for solving communicating average reward Markov decision processes
暂无分享,去创建一个
[1] Ronald A. Howard,et al. Dynamic Programming and Markov Processes , 1960 .
[2] D. Blackwell. Discrete Dynamic Programming , 1962 .
[3] C. Derman. DENUMERABLE STATE MARKOVIAN DECISION PROCESSES: AVERAGE COST CRITERION. , 1966 .
[4] Bennett L. Fox,et al. Scientific Applications: An algorithm for identifying the ergodic subchains and transient states of a stochastic matrix , 1967, Commun. ACM.
[5] J. Bather. Optimal decision procedures for finite Markov chains. Part II: Communicating systems , 1973, Advances in Applied Probability.
[6] Martin L. Puterman,et al. On the Convergence of Policy Iteration in Finite State Undiscounted Markov Decision Processes: The Unichain Case , 1987, Math. Oper. Res..
[7] Katsuhisa Ohno,et al. Computing Optimal Policies for Controlled Tandem Queueing Systems , 1987, Oper. Res..
[8] J. Filar,et al. Communicating MDPs: Equivalence and LP properties , 1988 .
[9] Peter W. Jones,et al. Stochastic Modelling and Analysis , 1988 .
[10] Keith W. Ross,et al. Multichain Markov Decision Processes with a Sample Path Constraint: A Decomposition Approach , 1991, Math. Oper. Res..