Symmetric Primal-Dual Approximate Linear Programming for Factored MDPs

A weakness of classical Markov decision processes is that they scale very poorly due to the flat state-space representation. Factored MDPs address this representational problem by exploiting problem structure to specify the transition and reward functions of an MDP in a compact manner. However, in general, solutions to factored MDPs do not retain the structure and compactness of the problem representation, forcing approximate solutions, with approximate linear programming (ALP) emerging as a very promising MDP-approximation technique. To date, most ALP work has focused on the primal-LP formulation, while the dual LP, which forms the basis for solving constrained Markov problems, has received much less attention. We show that a straightforward linear approximation of the dual optimization variables is problematic, because some of the required computations cannot be carried out efficiently. Nonetheless, we develop a composite approach that symmetrically approximates the primal and dual optimization variables (effectively approximating both the objective function and the feasible region of the LP) that is computationally feasible and suitable for solving constrained MDPs. We empirically show that this new ALP formulation also performs well on unconstrained problems.

[1]  D. Koller,et al.  Planning under uncertainty in complex structured environments , 2003 .

[2]  Daphne Koller,et al.  Computing Factored Value Functions for Policies in Structured MDPs , 1999, IJCAI.

[3]  P. Schweitzer,et al.  Generalized polynomial approximations in Markovian decision processes , 1985 .

[4]  Benjamin Van Roy,et al.  On Constraint Sampling in the Linear Programming Approach to Approximate Dynamic Programming , 2004, Math. Oper. Res..

[5]  Edmund H. Durfee,et al.  Graphical models in local, asymmetric multi-agent Markov decision processes , 2004, Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems, 2004. AAMAS 2004..

[6]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[7]  Yuval Rabani,et al.  Linear Programming , 2007, Handbook of Approximation Algorithms and Metaheuristics.

[8]  Edmund H. Durfee,et al.  Optimal Resource Allocation and Policy Formulation in Loosely-Coupled Markov Decision Processes , 2004, ICAPS.

[9]  Keiji Kanazawa,et al.  A model for reasoning about persistence and causation , 1989 .

[10]  Shobha Venkataraman,et al.  Efficient Solution Algorithms for Factored MDPs , 2003, J. Artif. Intell. Res..

[11]  Craig Boutilier,et al.  Exploiting Structure in Policy Construction , 1995, IJCAI.

[12]  John S. Edwards,et al.  Linear Programming and Finite Markovian Control Problems , 1983 .

[13]  Benjamin Van Roy,et al.  The Linear Programming Approach to Approximate Dynamic Programming , 2003, Oper. Res..

[14]  Richard Bellman,et al.  Adaptive Control Processes: A Guided Tour , 1961, The Mathematical Gazette.

[15]  E. Altman Constrained Markov Decision Processes , 1999 .

[16]  Craig Boutilier,et al.  Greedy linear value-approximation for factored Markov decision processes , 2002, AAAI/IAAI.

[17]  Craig Boutilier,et al.  Piecewise linear value function approximation for factored MDPs , 2002, AAAI/IAAI.

[18]  Umberto Bertelè,et al.  Nonserial Dynamic Programming , 1972 .