Bounding Performance Loss in Approximate MDP Homomorphisms

We define a metric for measuring behavior similarity between states in a Markov decision process (MDP), which takes action similarity into account. We show that the kernel of our metric corresponds exactly to the classes of states defined by MDP homomorphisms (Ravindran & Barto, 2003). We prove that the difference in the optimal value function of different states can be upper-bounded by the value of this metric, and that the bound is tighter than previous bounds provided by bisimulation metrics (Ferns et al. 2004, 2005). Our results hold both for discrete and for continuous actions. We provide an algorithm for constructing approximate homomorphisms, by using this metric to identify states that can be grouped together, as well as actions that can be matched. Previous research on this topic is based mainly on heuristics.

[1]  Ward Whitt,et al.  Approximations of Dynamic Programs, I , 1978, Math. Oper. Res..

[2]  Robin Milner,et al.  Communication and concurrency , 1989, PHI Series in computer science.

[3]  Kim G. Larsen,et al.  Bisimulation through Probabilistic Testing , 1991, Inf. Comput..

[4]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[5]  Alison L Gibbs,et al.  On Choosing and Bounding Probability Metrics , 2002, math/0209021.

[6]  Robert Givan,et al.  Equivalence notions and model minimization in Markov decision processes , 2003, Artif. Intell..

[7]  Balaraman Ravindran,et al.  Relativized Options: Choosing the Right Transformation , 2003, ICML.

[8]  Doina Precup,et al.  Metrics for Finite Markov Decision Processes , 2004, AAAI.

[9]  Doina Precup,et al.  Metrics for Markov Decision Processes with Infinite State Spaces , 2005, UAI.

[10]  Doina Precup,et al.  Methods for Computing State Similarity in Markov Decision Processes , 2006, UAI.

[11]  S. Arun-Kumar On Bisimilarities Induced by Relations on Actions , 2006, Fourth IEEE International Conference on Software Engineering and Formal Methods (SEFM'06).

[12]  Alicia P. Wolfe,et al.  Decision Tree Methods for Finding Reusable MDP Homomorphisms , 2006, AAAI.

[13]  Thomas J. Walsh,et al.  Towards a Unified Theory of State Abstraction for MDPs , 2006, AI&M.

[14]  Balaraman Ravindran Approximate Homomorphisms : A framework for non-exact minimization in Markov Decision Processes , 2022 .