论文信息 - Approximate Equivalence of Markov Decision Processes

Approximate Equivalence of Markov Decision Processes

We consider the problem of finding the minimal e-equivalent MDP for an MDP given in its tabular form. We show that the problem is NP-Hard and then give a bicriteria approximation algorithm to the problem. We suggest that the right measure for finding minimal e-equivalent model is L 1 rather than L ∞ by giving both an example, which demonstrates the drawback of using L ∞ , and performance guarantees for using L 1. In addition, we give a polynomial algorithm that decides whether two MDPs are equivalent.

Yishay Mansour | Eyal Even-Dar | Y. Mansour | Eyal Even-Dar

[1] Robert Givan,et al. Bounded-parameter Markov decision processes , 2000, Artif. Intell..

[2] Alexander Russell,et al. A Note on the Representational Incompatibility of Function Approximation and Factored Dynamics , 2002, NIPS.

[3] Robert Givan,et al. Model Reduction Techniques for Computing Approximately Optimal Solutions for Markov Decision Processes , 1997, UAI.

[4] Andrew G. Barto,et al. Reinforcement learning , 1998 .

[5] Keiji Kanazawa,et al. A model for reasoning about persistence and causation , 1989 .

[6] Robert Givan,et al. Equivalence notions and model minimization in Markov decision processes , 2003, Artif. Intell..

[7] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[8] Teofilo F. GONZALEZ,et al. Clustering to Minimize the Maximum Intercluster Distance , 1985, Theor. Comput. Sci..

[9] Judy Goldsmith,et al. Nonapproximability Results for Partially Observable Markov Decision Processes , 2011, Universität Trier, Mathematik/Informatik, Forschungsbericht.

[10] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[11] Andrew W. Moore,et al. Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..