论文信息 - On the Empirical State-Action Frequencies in Markov Decision Processes Under General Policies - 字舞流文

On the Empirical State-Action Frequencies in Markov Decision Processes Under General Policies

We consider the empirical state-action frequencies and the empirical reward in weakly communicating finite-state Markov decision processes under general policies. We define a certain polytope and establish that every element of this polytope is the limit of the empirical frequency vector, under some policy, in a strong sense. Furthermore, we show that the probability of exceeding a given distance between the empirical frequency vector and the polytope decays exponentially with time under every policy. We provide similar results for vector-valued empirical rewards.

John N. Tsitsiklis | Shie Mannor | Shie Mannor | J. Tsitsiklis

[1] H. D. Miller. A Convexity Property in the Theory of Random Variables Defined on a Finite Markov Chain , 1961 .

[2] W. M. Hirsch. A strong law for the maximum cumulative sum of independent random variables , 1965 .

[3] Cyrus Derman,et al. Finite State Markovian Decision Processes , 1970 .

[4] B. Hajek. Hitting-time and occupation-time bounds implied by drift analysis with applications , 1982, Advances in Applied Probability.

[5] John S. Edwards,et al. Linear Programming and Finite Markovian Control Problems , 1983 .

[6] Pravin Varaiya,et al. Stochastic Systems: Estimation, Identification, and Adaptive Control , 1986 .

[7] E. Altman,et al. Markov decision problems and state-action frequencies , 1991 .

[8] N. Shimkin. Extremal large deviations in controlled i.i.d. processes with applications to hypothesis testing , 1993, Advances in Applied Probability.

[9] Eitan Altman,et al. Rate of Convergence of Empirical Measures and Costs in Controlled Markov Chains and Transient Optimality , 1994, Math. Oper. Res..

[10] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[11] Robert G. Gallager,et al. Discrete Stochastic Processes , 1995 .

[12] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[13] John N. Tsitsiklis,et al. Introduction to linear optimization , 1997, Athena scientific optimization and computation series.

[14] Andrew G. Barto,et al. Reinforcement learning , 1998 .

[15] Amir Dembo,et al. Large Deviations Techniques and Applications , 1998 .

[16] Richard S. Sutton,et al. Introduction to Reinforcement Learning , 1998 .

[17] S. Meyn,et al. Multiplicative ergodicity and large deviations for an irreducible Markov chain , 2000 .

[18] S. Balajiy,et al. Multiplicative Ergodicity and Large Deviations for an Irreducible Markov Chain , 2000 .

[19] P. Glynn,et al. Hoeffding's inequality for uniformly ergodic Markov chains , 2002 .