Long-Term Values in Markov Decision Processes, (Co)Algebraically

This paper studies Markov decision processes (MDPs) from the categorical perspective of coalgebra and algebra. Probabilistic systems, similar to MDPs but without rewards, have been extensively studied, also coalgebraically, from the perspective of program semantics. In this paper, we focus on the role of MDPs as models in optimal planning, where the reward structure is central. The main contributions of this paper are (i) to give a coinductive explanation of policy improvement using a new proof principle, based on Banach’s Fixpoint Theorem, that we call contraction coinduction, and (ii) to show that the long-term value function of a policy with respect to discounted sums can be obtained via a generalized notion of corecursive algebra, which is designed to take boundedness into account. We also explore boundedness features of the Kantorovich lifting of the distribution monad to metric spaces.

[1]  Sean R Eddy,et al.  What is dynamic programming? , 2004, Nature Biotechnology.

[2]  E. Denardo CONTRACTION MAPPINGS IN THE THEORY UNDERLYING DYNAMIC PROGRAMMING , 1967 .

[3]  Jan J. M. M. Rutten,et al.  Universal coalgebra: a theory of systems , 2000, Theor. Comput. Sci..

[4]  P. T. Johnstone,et al.  Adjoint Lifting Theorems for Categories of Algebras , 1975 .

[5]  Stefan Milius Completely iterative algebras and completely iterative monads , 2005, Inf. Comput..

[6]  A. Sokolova,et al.  Sound and Complete Axiomatization of Trace Semantics for Probabilistic Systems , 2011, MFPS.

[7]  Jean-François Raskin,et al.  Antichains for the Automata-Based Approach to Model-Checking , 2009, Log. Methods Comput. Sci..

[8]  Abbas Edalat,et al.  Bisimulation for Labelled Markov Processes , 2002, Inf. Comput..

[9]  F. Bartels,et al.  On Generalised Coinduction and Probabilistic Specification Formats , 2004 .

[10]  Alexandra Silva,et al.  Trace semantics via determinization , 2015, J. Comput. Syst. Sci..

[11]  Doina Precup,et al.  Metrics for Finite Markov Decision Processes , 2004, AAAI.

[12]  Venanzio Capretta,et al.  Recursive Coalgebras from Comonads , 2004, CMCS.

[13]  Alison L Gibbs,et al.  On Choosing and Bounding Probability Metrics , 2002, math/0209021.

[14]  S. Lane Categories for the Working Mathematician , 1971 .

[15]  Nicholas Ruozzi,et al.  Applications of Metric Coinduction , 2007, CALCO.

[16]  Bart Jacobs,et al.  Distributive laws for the coinductive solution of recursive equations , 2006, Inf. Comput..

[17]  Paolo Baldan,et al.  Behavioral Metrics via Functor Lifting , 2014, FSTTCS.

[18]  lawa Kanas,et al.  Metric Spaces , 2020, An Introduction to Functional Analysis.

[19]  Roman Fric,et al.  A Categorical Approach to Probability Theory , 2010, Stud Logica.

[20]  Dusko Pavlovic,et al.  A Semantical Approach to Equilibria and Rationality , 2009, CALCO.

[21]  Venanzio Capretta,et al.  Corecursive Algebras: A Study of General Structured Corecursion , 2009, SBMF.

[22]  Alexandra Silva,et al.  Generalizing determinization from automata to coalgebras , 2013, Log. Methods Comput. Sci..

[23]  Ana Sokolova,et al.  Probabilistic systems coalgebraically: A survey , 2011, Theor. Comput. Sci..

[24]  R. Bellman,et al.  Dynamic Programming and Markov Processes , 1960 .

[25]  Dexter Kozen Coinductive Proof Principles for Stochastic Processes , 2006, LICS.

[26]  Samson Abramsky,et al.  Coalgebraic Analysis of Subgame-perfect Equilibria in Infinite Games without Discounting , 2017, Math. Struct. Comput. Sci..

[27]  Bartek Klin,et al.  Bialgebras for structural operational semantics: An introduction , 2011, Theor. Comput. Sci..

[28]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .