Finding the K best policies in a finite-horizon Markov decision process

Abstract Directed hypergraphs represent a general modelling and algorithmic tool, which have been successfully used in many different research areas such as artificial intelligence, database systems, fuzzy systems, propositional logic and transportation networks. However, modelling Markov decision processes using directed hypergraphs has not yet been considered. In this paper we consider finite-horizon Markov decision processes ( MDPs ) with finite state and action space and present an algorithm for finding the K best deterministic Markov policies. That is, we are interested in ranking the first K deterministic Markov policies in non-decreasing order using an additive criterion of optimality. The algorithm uses a directed hypergraph to model the finite-horizon MDP. It is shown that the problem of finding the optimal policy can be formulated as a minimum weight hyperpath problem and be solved in linear time, with respect to the input data representing the MDP, using different additive optimality criteria.

[1]  Lodewijk C. M. Kallenberg Survey of linear programming for standard and nonstandard Markovian control problems. Part II: Applications , 1994, Math. Methods Oper. Res..

[2]  Daniele Frigioni,et al.  Directed Hypergraphs: Problems, Algorithmic Results, and a Novel Decremental Approach , 2001, ICTCS.

[3]  Erik Jørgensen,et al.  Multi‐level hierarchic Markov processes as a framework for herd management support , 2000, Ann. Oper. Res..

[4]  Thomas W. Reps,et al.  An Incremental Algorithm for a Generalization of the Shortest-Path Problem , 1996, J. Algorithms.

[5]  Anders Ringgaard Kristensen,et al.  Hierarchic Markov processes and their applications in replacement models , 1988 .

[6]  Giuseppe F. Italiano,et al.  Hypergraph Traversal Revisited: Cost Measures and Dynamic Algorithms , 1998, MFCS.

[7]  Ronald A. Howard,et al.  Dynamic Programming and Markov Processes , 1960 .

[8]  Sean R Eddy,et al.  What is dynamic programming? , 2004, Nature Biotechnology.

[9]  C. Derman,et al.  Some Remarks on Finite Horizon Markovian Decision Models , 1965 .

[10]  Giorgio Gallo,et al.  Directed Hypergraphs and Applications , 1993, Discret. Appl. Math..

[11]  A. Bonato,et al.  Graphs and Hypergraphs , 2022 .

[12]  John S. Edwards,et al.  Linear Programming and Finite Markovian Control Problems , 1983 .

[13]  Ronald L. Rardin,et al.  Gainfree Leontief substitution flow problems , 1992, Math. Program..

[14]  Daniele Pretolani,et al.  Finding the K shortest hyperpaths using reoptimization , 2006, Oper. Res. Lett..

[15]  Daniele Pretolani,et al.  Finding the K shortest hyperpaths , 2005, Comput. Oper. Res..

[16]  Alan F. Blackwell,et al.  Programming , 1973, CSC '73.

[17]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[18]  Lars Relund Nielsen,et al.  Route Choice in Stochastic Time-Dependent Networks , 2004 .

[19]  Lodewijk C. M. Kallenberg,et al.  Survey of linear programming for standard and nonstandard Markovian control problems. Part I: Theory , 1994, Math. Methods Oper. Res..

[20]  E. Altman,et al.  Adaptive control of constrained Markov chains: Criteria and policies , 1991 .

[21]  S. Pallottino,et al.  Hyperpaths and shortest hyperpaths , 1989 .

[22]  Anders Ringgaard Kristensen A general software system for Markov decision processes in herd management applications , 2003 .

[23]  Arie Hordijk,et al.  Constrained Undiscounted Stochastic Dynamic Programming , 1984, Math. Oper. Res..

[24]  E. Altman Constrained Markov Decision Processes , 1999 .

[25]  Daniele Pretolani,et al.  A directed hypergraph model for random time dependent shortest paths , 2000, Eur. J. Oper. Res..

[26]  C. Derman,et al.  Constrained Markov Decision Chains , 1972 .

[27]  Giorgio Ausiello,et al.  Dynamic Maintenance of Directed Hypergraphs , 1990, Theor. Comput. Sci..

[28]  Daniele Pretolani,et al.  K shortest paths in stochastic time-dependent networks , 2004 .

[29]  Daniele Pretolani,et al.  Bicriterion shortest hyperpaths in random time‐dependent networks , 2003 .

[30]  Giorgio Ausiello,et al.  Minimal Representation of Directed Hypergraphs , 1986, SIAM J. Comput..