A framework and a mean-field algorithm for the local control of spatial processes

The Markov Decision Process (MDP) framework is a tool for the efficient modelling and solving of sequential decision-making problems under uncertainty. However, it reaches its limits when state and action spaces are large, as can happen for spatially explicit decision problems. Factored MDPs and dedicated solution algorithms have been introduced to deal with large factored state spaces. But the case of large action spaces remains an issue. In this article, we define graph-based Markov Decision Processes (GMDPs), a particular Factored MDP framework which exploits the factorization of the state space and the action space of a decision problem. Both spaces are assumed to have the same dimension. Transition probabilities and rewards are factored according to a single graph structure, where nodes represent pairs of state/decision variables of the problem. The complexity of this representation grows only linearly with the size of the graph, whereas the complexity of exact resolution grows exponentially. We propose an approximate solution algorithm exploiting the structure of a GMDP and whose complexity only grows quadratically with the size of the graph and exponentially with the maximum number of neighbours of any node. This algorithm, referred to as MF-API, belongs to the family of Approximate Policy Iteration (API) algorithms. It relies on a mean-field approximation of the value function of a policy and on a search limited to the suboptimal set of local policies. We compare it, in terms of performance, with two state-of-the-art algorithms for Factored MDPs: SPUDD and Approximate Linear Programming (ALP). Our experiments show that SPUDD is not generally applicable to solving GMDPs, due to the size of the action space we want to tackle. On the other hand, ALP can be adapted to solve GMDPs. We show that ALP is faster than MF-API and provides solutions of similar quality for most problems. However, for some problems MF-API provides significantly better policies, and in all cases provides a better approximation of the value function of approximate policies. These promising results show that the GMDP model offers a convenient framework for modelling and solving a large range of spatial and structured planning problems, that can arise in many different domains where processes are managed over networks: natural resources, agriculture, computer networks, etc.

[1]  Jesse Hoey,et al.  APRICODD: Approximate Policy Construction Using Decision Diagrams , 2000, NIPS.

[2]  E. Altman Constrained Markov Decision Processes , 1999 .

[3]  Benjamin Van Roy,et al.  On Constraint Sampling in the Linear Programming Approach to Approximate Dynamic Programming , 2004, Math. Oper. Res..

[4]  Nevin Lianwen Zhang,et al.  A computational theory of decision networks , 1993, Int. J. Approx. Reason..

[5]  Yang Xiang,et al.  Comparison of tightly and loosely coupled decision paradigms in multiagent expedition , 2010, Int. J. Approx. Reason..

[6]  Ronald J. Williams,et al.  Tight Performance Bounds on Greedy Policies Based on Imperfect Value Functions , 1993 .

[7]  Benjamin Van Roy,et al.  The Linear Programming Approach to Approximate Dynamic Programming , 2003, Oper. Res..

[8]  Ljusk Ola Eriksson,et al.  Management of the risk of wind damage in forestry: a graph-based Markov decision process approach , 2011, Ann. Oper. Res..

[9]  Carlos Guestrin,et al.  Multiagent Planning with Factored MDPs , 2001, NIPS.

[10]  N. Peyrard,et al.  A Graph-based Markov Decision Process framework for Optimising Collective Management of Diseases in Agriculture: Application to Blackleg on Canola , 2007 .

[11]  Nathalie Peyrard,et al.  Mean Field Approximation of the Policy Iteration Algorithm for Graph-Based Markov Decision Processes , 2006, ECAI.

[12]  Craig Boutilier,et al.  Piecewise linear value function approximation for factored MDPs , 2002, AAAI/IAAI.

[13]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[14]  Régis Sabbadin,et al.  Reinforcement learning for spatial processes , 2009 .

[15]  Craig Boutilier,et al.  Stochastic dynamic programming with factored representations , 2000, Artif. Intell..

[16]  C. Guestrin,et al.  Solving Factored MDPs with Hybrid State and Action Variables , 2006, J. Artif. Intell. Res..

[17]  Ola Sallnäs,et al.  WINDA—a system of models for assessing the probability of wind damage to forest stands within a landscape , 2004 .

[18]  Hans Daduna,et al.  Control of Spatially Structured Random Processes and Random Fields with Applications , 2006 .

[19]  John N. Tsitsiklis,et al.  Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[20]  Régis Sabbadin,et al.  Approximate Linear-Programming Algorithms for Graph-Based Markov Decision Processes , 2006, ECAI.

[21]  U. Rieder,et al.  Markov Decision Processes , 2010 .

[22]  Jesse Hoey,et al.  SPUDD: Stochastic Planning using Decision Diagrams , 1999, UAI.

[23]  Shobha Venkataraman,et al.  Efficient Solution Algorithms for Factored MDPs , 2003, J. Artif. Intell. Res..

[24]  M. Opper,et al.  Advanced mean field methods: theory and practice , 2001 .