Relaxations of Weakly Coupled Stochastic Dynamic Programs

We consider a broad class of stochastic dynamic programming problems that are amenable to relaxation via decomposition. These problems comprise multiple subproblems that are independent of each other except for a collection of coupling constraints on the action space. We fit an additively separable value function approximation using two techniques, namely, Lagrangian relaxation and the linear programming (LP) approach to approximate dynamic programming. We prove various results comparing the relaxations to each other and to the optimal problem value. We also provide a column generation algorithm for solving the LP-based relaxation to any desired optimality tolerance, and we report on numerical experiments on bandit-like problems. Our results provide insight into the complexity versus quality trade-off when choosing which of these relaxations to implement.

[1]  Sean R Eddy,et al.  What is dynamic programming? , 2004, Nature Biotechnology.

[2]  Jacques Desrosiers,et al.  Selected Topics in Column Generation , 2002, Oper. Res..

[3]  Cyrus Derman,et al.  Finite State Markovian Decision Processes , 1970 .

[4]  Warrren B Powell,et al.  An adaptive dynamic programming algorithm for a stochastic multiproduct batch dispatch problem , 2003 .

[5]  Jean-Pierre Aubin,et al.  Estimates of the Duality Gap in Nonconvex Optimization , 1976, Math. Oper. Res..

[6]  Marshall L. Fisher,et al.  The Lagrangian Relaxation Method for Solving Integer Programming Problems , 2004, Manag. Sci..

[7]  Arthur M. Geoffrion,et al.  Lagrangian Relaxation for Integer Programming , 2010, 50 Years of Integer Programming.

[8]  Benjamin Van Roy,et al.  The Linear Programming Approach to Approximate Dynamic Programming , 2003, Oper. Res..

[9]  D. Castañón Approximate dynamic programming for sensor management , 1997, Proceedings of the 36th IEEE Conference on Decision and Control.

[10]  Dimitri P. Bertsekas,et al.  Constrained Optimization and Lagrange Multiplier Methods , 1982 .

[11]  Martin W. P. Savelsbergh,et al.  The Stochastic Inventory Routing Problem with Direct Deliveries , 2002, Transp. Sci..

[12]  John N. Tsitsiklis,et al.  The Complexity of Optimal Queuing Network Control , 1999, Math. Oper. Res..

[13]  Warren B. Powell,et al.  Stochastic Programming in Transportation and Logistics , 2003 .

[14]  P. Schweitzer,et al.  Generalized polynomial approximations in Markovian decision processes , 1985 .

[15]  P. Whittle Restless bandits: activity allocation in a changing world , 1988, Journal of Applied Probability.

[16]  D.A. Castanon,et al.  Stochastic Control Bounds on Sensor Network Performance , 2005, Proceedings of the 44th IEEE Conference on Decision and Control.

[17]  Harvey J. Everett Generalized Lagrange Multiplier Method for Solving Problems of Optimum Allocation of Resources , 1963 .

[18]  Kee-Eung Kim,et al.  Solving Very Large Weakly Coupled Markov Decision Processes , 1998, AAAI/IAAI.

[19]  Alan R. Washburn,et al.  The LP/POMDP marriage: Optimization with imperfect information , 2000 .

[20]  Benjamin Van Roy,et al.  On Constraint Sampling in the Linear Programming Approach to Approximate Dynamic Programming , 2004, Math. Oper. Res..

[21]  Jeffrey Thomas Hawkins,et al.  A Langrangian decomposition approach to weakly coupled dynamic optimization problems and its applications , 2003 .

[22]  K. Talluri,et al.  An Analysis of Bid-Price Controls for Network Revenue Management , 1998 .

[23]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[24]  Daniel Adelman,et al.  A Price-Directed Approach to Stochastic Inventory/Routing , 2004, Oper. Res..

[25]  Leon S. Lasdon,et al.  Optimization Theory of Large Systems , 1970 .

[26]  John N. Tsitsiklis,et al.  The complexity of optimal queueing network control , 1994, Proceedings of IEEE 9th Annual Conference on Structure in Complexity Theory.

[27]  Paolo Toth,et al.  Knapsack Problems: Algorithms and Computer Implementations , 1990 .

[28]  Dimitris Bertsimas,et al.  A Learning Approach for Interactive Marketing to a Customer Segment , 2007, Oper. Res..

[29]  P. Whittle Restless Bandits: Activity Allocation in a Changing World , 1988 .

[30]  John N. Tsitsiklis,et al.  Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.