论文信息 - Co)Algebraic Techniques for Markov Decision Processes

Co)Algebraic Techniques for Markov Decision Processes

Markov Decision Processes (MDPs) [11] are a family of probabilistic, state-based models used in planning under uncertainty and reinforcement learning. Informally, an MDP models a situation in which an agent (the decision maker) makes choices at each state of a process, and each choice leads to some reward and a probabilistic transition to a next state. The aim of the agent is to find an optimal policy, i.e., a way of choosing actions that maximizes future expected rewards. The classic theory of MDPs with discounting is well-developed (see [11, Chapter 6]), and indeed we do not prove any new results about MDPs as such. Our work is inspired by Bellman’s principle of optimality, which states the following: “An optimal policy has the property that whatever the initial state and initial decision are, the remaining decisions must constitute an optimal policy with regard to the state resulting from the first decision” [2, Chapter III.3]. This principle has clear coinductive overtones, and our aim is to situate it in a body of mathematics that is also concerned with infinite behavior and coinductive proof principles, i.e., in coalgebra. Probabilistic systems of similar type have been studied extensively, also coalgebraically, in the area of program semantics (see for instance [5, 6, 14, 15]). Our focus is not so much on the observable behavior of MDPs viewed as computations, but on their role in solving optimal planning problems. This abstract is based on [7] to which we refer for a more detailed account.

Helle Hvid Hansen | Lawrence S. Moss | Frank M. V. Feys

[1] Sean R Eddy,et al. What is dynamic programming? , 2004, Nature Biotechnology.

[2] Alexandra Silva,et al. Trace semantics via determinization , 2015, J. Comput. Syst. Sci..

[3] Doina Precup,et al. Metrics for Finite Markov Decision Processes , 2004, AAAI.

[4] Venanzio Capretta,et al. Corecursive Algebras: A Study of General Structured Corecursion , 2009, SBMF.

[5] F. Bartels,et al. On Generalised Coinduction and Probabilistic Specification Formats , 2004 .

[6] Venanzio Capretta,et al. Recursive Coalgebras from Comonads , 2004, CMCS.

[7] R. Bellman,et al. Dynamic Programming and Markov Processes , 1960 .

[8] Ana Sokolova,et al. Probabilistic systems coalgebraically: A survey , 2011, Theor. Comput. Sci..

[9] A. Sokolova,et al. Sound and Complete Axiomatization of Trace Semantics for Probabilistic Systems , 2011, MFPS.

[10] Abbas Edalat,et al. Bisimulation for Labelled Markov Processes , 2002, Inf. Comput..

[11] Helle Hvid Hansen,et al. Long-Term Values in Markov Decision Processes, (Co)Algebraically , 2018, CMCS.

[12] Dexter Kozen. Coinductive Proof Principles for Stochastic Processes , 2006, LICS.

[13] L. Shapley,et al. Stochastic Games* , 1953, Proceedings of the National Academy of Sciences.

[14] Nicholas Ruozzi,et al. Applications of Metric Coinduction , 2007, CALCO.

[15] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .