Metrics for Markov Decision Processes with Infinite State Spaces

We present metrics for measuring state similarity in Markov decision processes (MDPs) with infinitely many states, including MDPs with continuous state spaces. Such metrics provide a stable quantitative analogue of the notion of bisimulation for MDPs, and are suitable for use in MDP approximation. We show that the optimal value function associated with a discounted infinite horizon planning task varies continuously with respect to our metric distances.

[1]  Robin Milner,et al.  A Calculus of Communicating Systems , 1980, Lecture Notes in Computer Science.

[2]  David Park,et al.  Concurrency and Automata on Infinite Sequences , 1981, Theoretical Computer Science.

[3]  James B. Orlin,et al.  A faster strongly polynomial minimum cost flow algorithm , 1993, STOC '88.

[4]  Gautam Appa,et al.  Linear Programming in Infinite-Dimensional Spaces , 1989 .

[5]  Kim G. Larsen,et al.  Bisimulation through Probabilistic Testing , 1991, Inf. Comput..

[6]  Glynn Winskel,et al.  The formal semantics of programming languages - an introduction , 1993, Foundation of computing series.

[7]  G. Winskel The formal semantics of programming languages , 1993 .

[8]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[9]  Abbas Edalat,et al.  Bisimulation for labelled Markov processes , 1997, Proceedings of Twelfth Annual IEEE Symposium on Logic in Computer Science.

[10]  R. Blute,et al.  Bisimulation for Labeled Markov Processes , 1997 .

[11]  S. Rachev,et al.  Mass transportation problems , 1998 .

[12]  Mtw,et al.  Mass Transportation Problems: Vol. I: Theory@@@Mass Transportation Problems: Vol. II: Applications , 1999 .

[13]  Craig Boutilier,et al.  Decision-Theoretic Planning: Structural Assumptions and Computational Leverage , 1999, J. Artif. Intell. Res..

[14]  Radha Jagadeesan,et al.  Metrics for Labeled Markov Systems , 1999, CONCUR.

[15]  J. Worrell,et al.  Towards Quantitative Verification of Probabilistic Transition Systems , 2001, ICALP.

[16]  James Worrell,et al.  An Algorithm for Quantitative Verification of Probabilistic Transition Systems , 2001, CONCUR.

[17]  Radha Jagadeesan,et al.  The metric analogue of weak bisimulation for probabilistic processes , 2002, Proceedings 17th Annual IEEE Symposium on Logic in Computer Science.

[18]  Alison L Gibbs,et al.  On Choosing and Bounding Probability Metrics , 2002, math/0209021.

[19]  Robert Givan,et al.  Equivalence notions and model minimization in Markov decision processes , 2003, Artif. Intell..

[20]  Mario Bravetti,et al.  Performance measure sensitive congruences for Markovian process algebras , 2003, Theor. Comput. Sci..

[21]  Doina Precup,et al.  Metrics for Finite Markov Decision Processes , 2004, AAAI.

[22]  Joseph Y. Halpern,et al.  Proceedings of the 20th conference on Uncertainty in artificial intelligence , 2004, UAI 2004.

[23]  Doina Precup,et al.  An approximation algorithm for labelled Markov processes: towards realistic approximation , 2005, Second International Conference on the Quantitative Evaluation of Systems (QEST'05).