Revisiting Approximate Linear Programming Using a Saddle Point Based Reformulation and Root Finding Solution Approach

Approximate linear programs (ALPs) are well-known models for computing value function approximations (VFAs) for high dimensional Markov decision processes (MDPs) arising in business applications. VFAs from ALPs have desirable theoretical properties, define an operating policy, and provide a lower bound on the optimal policy cost, which can be used to assess the suboptimality of heuristic policies. However, solving ALPs near optimally remains challenging, for instance, in applications where the MDP includes cost functions or transition dynamics that are nonlinear or when rich basis functions are required to obtain a good VFA. We address this tension between ALP theory and solvability by (i) proposing a saddle point based reformulation of an ALP that endogenizes a state-action density function as a dual decision variable to avoid non-convexities, and (ii) developing a solution approach, ALP-Secant, that combines root finding and saddle point methods to solve this reformulation. We establish that ALP-Secant returns a near optimal ALP solution and a lower bound on the optimal policy cost with high probability in a finite number of iterations. We numerically compare ALP-Secant with the commonly used constraint sampling approach to solve ALP and a look-ahead heuristic on inventory control and energy storage applications, where using row generation is not a viable option. We find that ALP-Secant is more effective than constraint sampling for solving ALPs and delivers high quality policies and lower bounds, with its policies outperforming those from the other two heuristics. Our ALP reformulation and solution approach broaden the applicability of approximate linear programming.

[1]  John C. Duchi Introductory lectures on stochastic optimization , 2018, IAS/Park City Mathematics Series.

[2]  Nicola Secomandi,et al.  Relationship between least squares Monte Carlo and approximate linear programming , 2017, Oper. Res. Lett..

[3]  Dmitriy Drusvyatskiy,et al.  Level-set methods for convex optimization , 2016, Mathematical Programming.

[4]  Dan Zhang,et al.  Reductions of Approximate Linear Programs for Network Revenue Management , 2015, Oper. Res..

[5]  Nicola Secomandi,et al.  Relaxations of Approximate Linear Programs for the Real Option Management of Commodity Storage , 2015, Manag. Sci..

[6]  James Renegar,et al.  A Framework for Applying Subgradient Methods to Conic Optimization Problems , 2015, 1503.02611.

[7]  Nicola Secomandi,et al.  Real Options and Merchant Operations of Energy and Other Commodities , 2014, Found. Trends Technol. Inf. Oper. Manag..

[8]  Sébastien Bubeck Convex Optimization: Algorithms and Complexity , 2014, Found. Trends Mach. Learn..

[9]  Christiane Barz,et al.  A Unifying Approximate Dynamic Programming Model for the Economic Lot Scheduling Problem , 2014, Math. Oper. Res..

[10]  Andrea Zanella,et al.  Optimal and Compact Control Policies for Energy Storage Units With Single and Multiple Batteries , 2014, IEEE Transactions on Smart Grid.

[11]  Xin Chen,et al.  Coordinating Inventory Control and Pricing Strategies for Perishable Products , 2014, Oper. Res..

[12]  Yunmei Chen,et al.  Optimal Primal-Dual Methods for a Class of Saddle Point Problems , 2013, SIAM J. Optim..

[13]  Daniel Adelman,et al.  Dynamic Capacity Allocation to Customers Who Remember Past Service , 2013, Manag. Sci..

[14]  Vivek F. Farias,et al.  Non-parametric Approximate Dynamic Programming via the Kernel Method , 2012, NIPS.

[15]  Vivek F. Farias,et al.  Pathwise Optimization for Optimal Stopping Problems , 2012, Manag. Sci..

[16]  Federico Silvestro,et al.  Optimal Management Strategy of a Battery-Based Storage System to Improve Renewable Energy Integration in Distribution Networks , 2012, IEEE Transactions on Smart Grid.

[17]  Laurent Massoulié,et al.  Optimal Control of End-User Energy Storage , 2012, IEEE Transactions on Smart Grid.

[18]  Diego Klabjan,et al.  Computing Near-Optimal Policies in Generalized Joint Replenishment , 2012, INFORMS J. Comput..

[19]  Marek Petrik,et al.  Feature Selection Using Regularization in Approximate Linear Programs for Markov Decision Processes , 2010, ICML.

[20]  Alexander Shapiro,et al.  Lectures on Stochastic Programming: Modeling and Theory , 2009 .

[21]  Dan Zhang,et al.  An Approximate Dynamic Programming Approach to Network Revenue Management with Customer Choice , 2009, Transp. Sci..

[22]  Alexander Shapiro,et al.  Stochastic Approximation approach to Stochastic Programming , 2013 .

[23]  M. Elhafsi,et al.  Optimal control of a production‐inventory system with both backorders and lost sales , 2008 .

[24]  Maurice Queyranne,et al.  Dynamic Multipriority Patient Scheduling for a Diagnostic Resource , 2008, Oper. Res..

[25]  Paul H. Zipkin On the Structure of Lost-Sales Inventory Models , 2008, Oper. Res..

[26]  Paul H. Zipkin Old and New Methods for Lost-Sales Inventory Systems , 2008, Oper. Res..

[27]  Warren B. Powell,et al.  Approximate Dynamic Programming - Solving the Curses of Dimensionality , 2007 .

[28]  Diego Klabjan,et al.  An Infinite-Dimensional Linear Programming Algorithm for Deterministic Semi-Markov Decision Processes on Borel Spaces , 2007, Math. Oper. Res..

[29]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[30]  Daniel Adelman,et al.  Dynamic Bid Prices in Revenue Management , 2007, Oper. Res..

[31]  Huseyin Topaloglu,et al.  Approximate dynamic programming methods for an inventory allocation problem under uncertainty , 2006 .

[32]  Christian P. Robert,et al.  Monte Carlo Statistical Methods , 2005, Springer Texts in Statistics.

[33]  Daniel Adelman,et al.  A Price-Directed Approach to Stochastic Inventory/Routing , 2004, Oper. Res..

[34]  Benjamin Van Roy,et al.  On Constraint Sampling in the Linear Programming Approach to Approximate Dynamic Programming , 2004, Math. Oper. Res..

[35]  Benjamin Van Roy,et al.  The Linear Programming Approach to Approximate Dynamic Programming , 2003, Oper. Res..

[36]  Daniel Adelman,et al.  Price-Directed Replenishment of Subsets: Methodology and Its Application to Inventory Routing , 2003, Manuf. Serv. Oper. Manag..

[37]  O. Hernández-Lerma,et al.  Discrete-time Markov control processes , 1999 .

[38]  John M. Wilson,et al.  Introduction to Stochastic Programming , 1998, J. Oper. Res. Soc..

[39]  M. Willem Minimax Theorems , 1997 .

[40]  Stanley E. Zin,et al.  SPLINE APPROXIMATIONS TO VALUE FUNCTIONS , 1997, Macroeconomic Dynamics.

[41]  B. Eddy Patuwo,et al.  A partial backorder control for continuous review (r, Q) inventory system with poisson demand and constant lead time , 1995, Comput. Oper. Res..

[42]  Yurii Nesterov,et al.  New variants of bundle methods , 1995, Math. Program..

[43]  Dimitri P. Bertsekas,et al.  Dynamic Programming and Optimal Control, Two Volume Set , 1995 .

[44]  Steven Nahmias,et al.  Optimizing inventory levels in a two-echelon retailer system with partial lost sales , 1994 .

[45]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[46]  P. Schweitzer,et al.  Generalized polynomial approximations in Markovian decision processes , 1985 .

[47]  W. Hoeffding Probability inequalities for sum of bounded random variables , 1963 .

[48]  Paul H. Zipkin,et al.  Quadratic Approximation of Cost Functions in Lost Sales and Perishable Inventory Control Problems , 2014 .

[49]  Kai Wang,et al.  Heuristics for Inventory Systems Based on Quadratic Approximation of L-Natural-Convex Value Functions , 2014 .

[50]  Itir Z. Karaesmen,et al.  Managing Perishable and Aging Inventories: Review and Future Research Directions , 2011 .

[51]  A. Juditsky,et al.  5 First-Order Methods for Nonsmooth Convex Large-Scale Optimization , I : General Purpose Methods , 2010 .

[52]  A. Juditsky 6 First-Order Methods for Nonsmooth Convex Large-Scale Optimization , II : Utilizing Problem ’ s Structure , 2010 .

[53]  Mateo Restrepo,et al.  Computational methods for static allocation and real-time redeployment of ambulances , 2008 .

[54]  Benjamin Van Roy,et al.  Tetris: A Study of Randomized Constraint Sampling , 2006 .

[55]  Benjamin Van Roy,et al.  An Approximate Dynamic Programming Approach to Network Revenue Management , 2006 .

[56]  S. Resnick A Probability Path , 1999 .

[57]  J. W. Nieuwenhuis,et al.  Boekbespreking van D.P. Bertsekas (ed.), Dynamic programming and optimal control - volume 2 , 1999 .

[58]  Kazuoki Azuma WEIGHTED SUMS OF CERTAIN DEPENDENT RANDOM VARIABLES , 1967 .

[59]  Amiel Feinstein,et al.  Information and information stability of random variables and processes , 1964 .

[60]  K Fan,et al.  Minimax Theorems. , 1953, Proceedings of the National Academy of Sciences of the United States of America.