Dynamic Programming and Suboptimal Control: A Survey from ADP to MPC

We survey some recent research directions within the field of approximate dynamic programming, with a particular emphasis on rollout algorithms and model predictive control (MPC). We argue that while they are motivated by different concerns, these two methodologies are closely connected, and the mathematical essence of their desirable properties (cost improvement and stability, respectively) is couched on the central dynamic programming idea of policy iteration. In particular, among other things, we show that the most common MPC schemes can be viewed as rollout algorithms and are related to policy iteration methods. Furthermore, we embed rollout and MPC within a new unifying suboptimal control framework, based on a concept of restricted or constrained structure policies, which contains these schemes as special cases.

[1]  Hans S. Witsenhausen,et al.  Inequalities for the performance of suboptimal uncertain systems , 1969, Autom..

[2]  H. Witsenhausen On Performance Bounds for Uncertain Systems , 1970 .

[3]  D. Bertsekas Control of uncertain systems with a set-membership description of the uncertainty , 1971 .

[4]  D. Bertsekas,et al.  On the minimax reachability of target sets and target tubes , 1971 .

[5]  D. Bertsekas Infinite time reachability of state-space regions by using feedback control , 1972 .

[6]  C. White,et al.  Application of Jensen's inequality to adaptive suboptimal design , 1980 .

[7]  Mary W. Cooper,et al.  Dynamic Programming and the Calculus of Variations , 1981 .

[8]  Jeffrey M. Jaffe,et al.  Algorithms for finding paths with multiple constraints , 1984, Networks.

[9]  E. Martins On a multicriteria shortest path problem , 1984 .

[10]  E. Gilbert,et al.  Optimal infinite-horizon feedback laws for a general class of constrained discrete-time systems: Stability and moving-horizon approximations , 1988 .

[11]  J. Deller Set membership identification in digital signal processing , 1989, IEEE ASSP Magazine.

[12]  Bruce Abramson,et al.  Expected-Outcome: A General Model of Static Evaluation , 1990, IEEE Trans. Pattern Anal. Mach. Intell..

[13]  Chelsea C. White,et al.  Multiobjective A* , 1991, JACM.

[14]  Tamer Bąar Optimum performance levels for minimax filters, predictors and smoothers , 1991 .

[15]  Stephen P. Boyd,et al.  Set-membership identification of systems with parametric and nonparametric uncertainty , 1992 .

[16]  José Rodellar,et al.  Adaptive Predictive Control: From the Concepts to Plant Optimization , 1995 .

[17]  Dimitri P. Bertsekas,et al.  Dynamic Programming and Optimal Control, Two Volume Set , 1995 .

[18]  Gerald Tesauro,et al.  On-line Policy Improvement using Monte-Carlo Search , 1996, NIPS.

[19]  John N. Tsitsiklis,et al.  Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[20]  John N. Tsitsiklis,et al.  Rollout Algorithms for Combinatorial Optimization , 1997, J. Heuristics.

[21]  James D Christodouleas Solution methods for multiprocessor network scheduling problems, with application to railroad operations , 1997 .

[22]  Andrew G. Barto,et al.  Reinforcement learning , 1998 .

[23]  Dimitri P. Bertsekas,et al.  Rollout Algorithms for Stochastic Scheduling Problems , 1999, J. Heuristics.

[24]  Franco Blanchini,et al.  Set invariance in control , 1999, Autom..

[25]  Jay H. Lee,et al.  Model predictive control: past, present and future , 1999 .

[26]  John N. Tsitsiklis,et al.  Actor-Critic Algorithms , 1999, NIPS.

[27]  Nicola Secomandi,et al.  Comparing neuro-dynamic programming algorithms for the vehicle routing problem with stochastic demands , 2000, Comput. Oper. Res..

[28]  James B. Rawlings,et al.  Tutorial overview of model predictive control , 2000 .

[29]  David Q. Mayne,et al.  Constrained model predictive control: Stability and optimality , 2000, Autom..

[30]  R. Musmanno,et al.  Label Correcting Methods to Solve Multicriteria Shortest Path Problems , 2001 .

[31]  John N. Tsitsiklis,et al.  Simulation-based optimization of Markov reward processes , 2001, IEEE Trans. Autom. Control..

[32]  David Q. Mayne,et al.  Control of Constrained Dynamic Systems , 2001, Eur. J. Control.

[33]  Nicola Secomandi,et al.  A Rollout Policy for the Vehicle Routing Problem with Stochastic Demands , 2001, Oper. Res..

[34]  Jan M. Maciejowski,et al.  Predictive control : with constraints , 2002 .

[35]  Dimitris Bertsimas,et al.  An Approximate Dynamic Programming Approach to Multidimensional Knapsack Problems , 2002, Manag. Sci..

[36]  Michael C. Ferris,et al.  Neuro-Dynamic Programming for Radiation Treatment Planning , 2002 .

[37]  S. Joe Qin,et al.  A survey of industrial model predictive control technology , 2003 .

[38]  Francesca Guerriero,et al.  A cooperative parallel rollout algorithm for the sequential ordering problem , 2003, Parallel Comput..

[39]  Frank Allgöwer,et al.  State and Output Feedback Nonlinear Model Predictive Control: An Overview , 2003, Eur. J. Control.

[40]  Benjamin Van Roy,et al.  The Linear Programming Approach to Approximate Dynamic Programming , 2003, Oper. Res..

[41]  Ioana Popescu,et al.  Revenue Management in a Dynamic Network Environment , 2003, Transp. Sci..

[42]  Vijay R. Konda,et al.  OnActor-Critic Algorithms , 2003, SIAM J. Control. Optim..

[43]  Krishna R. Pattipati,et al.  Rollout strategies for sequential fault diagnosis , 2003, IEEE Trans. Syst. Man Cybern. Part A.

[44]  Robert Givan,et al.  Congestion control using policy rollout , 2003, 42nd IEEE International Conference on Decision and Control (IEEE Cat. No.03CH37475).

[45]  Nicola Secomandi,et al.  Analysis of a Rollout Approach to Sequencing Problems with Stochastic Routing Applications , 2003, J. Heuristics.

[46]  Andrew G. Barto,et al.  Building a Basic Block Instruction Scheduler with Reinforcement Learning and Rollouts , 2002, Machine Learning.

[47]  Michael C. Ferris,et al.  Digital Object Identifier (DOI) 10.1007/s10107-004-0530-y , 2004 .

[48]  Benjamin Van Roy,et al.  Solitaire: Man Versus Machine , 2004, NIPS.

[49]  Robert Givan,et al.  Parallel Rollout for Online Solution of Partially Observable Markov Decision Processes , 2004, Discret. Event Dyn. Syst..

[50]  Benjamin Van Roy,et al.  On Constraint Sampling in the Linear Programming Approach to Approximate Dynamic Programming , 2004, Math. Oper. Res..

[51]  Dario Pacciarelli,et al.  A Rollout Metaheuristic for Job Shop Scheduling Problems , 2004, Ann. Oper. Res..

[52]  D. Bertsekas Rollout Algorithms for Constrained Dynamic Programming , 2005 .

[53]  D. Bertsekas Rollout Algorithms for Constrained Dynamic Programming 1 , 2005 .

[54]  A. Rantzer Relaxed dynamic programming in switching systems , 2006 .

[55]  Daiheng Ni,et al.  Forty-Fifth Annual Allerton Conference on Communication, Control, and Computing , 2007 .