Global Adaptive Dynamic Programming for Continuous-Time Nonlinear Systems

This paper presents a novel method of global adaptive dynamic programming (ADP) for the adaptive optimal control of nonlinear polynomial systems. The strategy consists of relaxing the problem of solving the Hamilton-Jacobi-Bellman (HJB) equation to an optimization problem, which is solved via a new policy iteration method. The proposed method distinguishes from previously known nonlinear ADP methods in that the neural network approximation is avoided, giving rise to significant computational improvement. Instead of semiglobally or locally stabilizing, the resultant control policy is globally stabilizing for a general class of nonlinear polynomial systems. Furthermore, in the absence of the a priori knowledge of the system dynamics, an online learning method is devised to implement the proposed policy iteration technique by generalizing the current ADP theory. Finally, three numerical examples are provided to validate the effectiveness of the proposed method.

[1]  Randal W. Beard,et al.  Galerkin approximations of the generalized Hamilton-Jacobi-Bellman equation , 1997, Autom..

[2]  Peter J Seiler,et al.  SOSTOOLS: Sum of squares optimization toolbox for MATLAB , 2002 .

[3]  P. Schweitzer,et al.  Generalized polynomial approximations in Markovian decision processes , 1985 .

[4]  A. Casavola,et al.  Constrained Nonlinear Polynomial Time-Delay Systems: A Sum-of-Squares Approach to Estimate the Domain of Attraction , 2012, IEEE Transactions on Automatic Control.

[5]  Jooyoung Park,et al.  Universal Approximation Using Radial-Basis-Function Networks , 1991, Neural Computation.

[6]  Luigi Fortuna,et al.  Reinforcement Learning and Adaptive Dynamic Programming for Feedback Control , 2009 .

[7]  P. Werbos,et al.  Beyond Regression : "New Tools for Prediction and Analysis in the Behavioral Sciences , 1974 .

[8]  Sean R Eddy,et al.  What is dynamic programming? , 2004, Nature Biotechnology.

[9]  Eduardo Sontag On the Observability of Polynomial Systems, I: Finite-Time Problems , 1979 .

[10]  George N. Saridis,et al.  An Approximation Theory of Optimal Control for Trainable Manipulators , 1979, IEEE Transactions on Systems, Man, and Cybernetics.

[11]  Frank L. Lewis,et al.  2009 Special Issue: Neural network approach to continuous-time direct adaptive optimal control for partially unknown nonlinear systems , 2009 .

[12]  Victor M. Becerra,et al.  Optimal control , 2008, Scholarpedia.

[13]  Yu Jiang,et al.  Robust Adaptive Dynamic Programming and Feedback Stabilization of Nonlinear Systems , 2014, IEEE Transactions on Neural Networks and Learning Systems.

[14]  W. Marsden I and J , 2012 .

[15]  Didier Henrion,et al.  GloptiPoly: Global optimization over polynomials with Matlab and SeDuMi , 2003, TOMS.

[16]  M. James,et al.  Extending H-infinity Control to Nonlinear Systems: Control of Nonlinear Systems to Achieve Performance Objectives , 1987 .

[17]  Huaguang Zhang,et al.  Adaptive Dynamic Programming: An Introduction , 2009, IEEE Computational Intelligence Magazine.

[18]  Zhong-Ping Jiang,et al.  Design of Robust Adaptive Controllers for Nonlinear Systems with Dynamic Uncertainties , 1998, Autom..

[19]  P. Olver Nonlinear Systems , 2013 .

[20]  Benjamin Van Roy,et al.  The Linear Programming Approach to Approximate Dynamic Programming , 2003, Oper. Res..

[21]  Frank L. Lewis,et al.  Optimal Adaptive Control and Differential Games by Reinforcement Learning Principles , 2012 .

[22]  Anton van den Hengel,et al.  Semidefinite Programming , 2014, Computer Vision, A Reference Guide.

[23]  Zhong-Ping Jiang,et al.  Computational adaptive optimal control for continuous-time linear systems with completely unknown dynamics , 2012, Autom..

[24]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[25]  Hong-Fei Sun,et al.  Robust control synthesis of polynomial nonlinear systems using sum of squares technique , 2013 .

[27]  P. Parrilo Structured semidefinite programs and semialgebraic geometry methods in robustness and optimization , 2000 .

[28]  Alessandro Astolfi,et al.  Dynamic Approximate Solutions of the HJ Inequality and of the HJB Equation for Input-Affine Nonlinear Systems , 2012, IEEE Transactions on Automatic Control.

[29]  Stephen P. Boyd,et al.  Approximate dynamic programming via iterated Bellman inequalities , 2015 .

[30]  A. Schaft,et al.  L2-Gain and Passivity in Nonlinear Control , 1999 .

[31]  Péter Gáspár,et al.  Active suspension design using linear parameter varying control , 2003 .

[32]  Zhong-Ping Jiang,et al.  Robust adaptive dynamic programming for linear and nonlinear systems: An overview , 2013, Eur. J. Control.

[33]  R. Bellman,et al.  FUNCTIONAL APPROXIMATIONS AND DYNAMIC PROGRAMMING , 1959 .

[34]  L. Ljung,et al.  Adaptive Control Design and Analysis ( , 2014 .

[35]  Frank L. Lewis,et al.  Reinforcement Learning and Approximate Dynamic Programming (RLADP)Â -Â Foundations, Common Misconceptions, and the Challenges Ahead , 2013 .

[36]  Frank L. Lewis,et al.  Optimal Control: Lewis/Optimal Control 3e , 2012 .

[37]  Miroslav Krstic,et al.  Nonlinear and adaptive control de-sign , 1995 .

[38]  J. Lofberg,et al.  YALMIP : a toolbox for modeling and optimization in MATLAB , 2004, 2004 IEEE International Conference on Robotics and Automation (IEEE Cat. No.04CH37508).

[39]  John Lygeros,et al.  Approximate dynamic programming via sum of squares programming , 2012, 2013 European Control Conference (ECC).

[40]  Anuradha M. Annaswamy,et al.  Robust Adaptive Control , 1984, 1984 American Control Conference.

[41]  Jun Xu,et al.  Simultaneous Stabilization and Robust Control for Polynomial Nonlinear Systems Using SOS , 2007, 2007 American Control Conference.

[42]  Dimitri P. Bertsekas,et al.  Dynamic Programming and Optimal Control, Two Volume Set , 1995 .

[43]  Lihua Xie,et al.  Simultaneous Stabilization and Robust Control of Polynomial Nonlinear Systems Using SOS Techniques , 2009, IEEE Transactions on Automatic Control.

[44]  Peter J Seiler,et al.  SOSTOOLS and its control applications , 2005 .

[45]  Jie Huang,et al.  Global robust stabilization of cascaded polynomial systems , 2002, Syst. Control. Lett..

[46]  Panos M. Pardalos,et al.  Approximate dynamic programming: solving the curses of dimensionality , 2009, Optim. Methods Softw..

[47]  Wilfrid Perruquetti,et al.  Stabilization of nonaffine systems: a constructive method for polynomial systems , 2005, IEEE Transactions on Automatic Control.

[48]  Kurt Hornik,et al.  Multilayer feedforward networks are universal approximators , 1989, Neural Networks.

[49]  A. Papachristodoulou,et al.  Nonlinear control synthesis by sum of squares optimization: a Lyapunov-based approach , 2004, 2004 5th Asian Control Conference (IEEE Cat. No.04EX904).

[50]  Petar V. Kokotovic,et al.  Useful nonlinearities and global stabilization of bifurcations in a model of jet engine surge and stall , 1998, IEEE Trans. Autom. Control..

[51]  Joel W. Burdick,et al.  Linearly Solvable Stochastic Control Lyapunov Functions , 2014, SIAM J. Control. Optim..

[52]  Frank L. Lewis,et al.  Reinforcement Learning and Approximate Dynamic Programming for Feedback Control , 2012 .

[53]  Mrdjan J. Jankovic,et al.  Constructive Nonlinear Control , 2011 .

[54]  Zhong-Ping Jiang,et al.  Adaptive dynamic programming and optimal control of nonlinear nonaffine systems , 2014, Autom..

[55]  Alan J. Laub,et al.  The LMI control toolbox , 1994, Proceedings of 1994 33rd IEEE Conference on Decision and Control.

[56]  Joel W. Burdick,et al.  Semidefinite relaxations for stochastic optimal control policies , 2014, 2014 American Control Conference.

[57]  Gang Tao,et al.  Control of sandwich nonlinear systems , 2003 .

[58]  John N. Tsitsiklis,et al.  Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[59]  Rekha R. Thomas,et al.  Semidefinite Optimization and Convex Algebraic Geometry , 2012 .

[60]  M. Krstić,et al.  Inverse optimal design of input-to-state stabilizing nonlinear controllers , 1998, IEEE Trans. Autom. Control..

[61]  Eduardo Sontag,et al.  Forward Completeness, Unboundedness Observability, and their Lyapunov Characterizations , 1999 .

[62]  W. C. Mcginnis Ideals , 1925, Free Speech.

[63]  Donal O'Shea,et al.  Ideals, varieties, and algorithms - an introduction to computational algebraic geometry and commutative algebra (2. ed.) , 1997, Undergraduate texts in mathematics.

[64]  Pablo A. Parrilo,et al.  Introducing SOSTOOLS: a general purpose sum of squares programming solver , 2002, Proceedings of the 41st IEEE Conference on Decision and Control, 2002..

[65]  Stephen P. Boyd,et al.  Performance bounds and suboptimal policies for linear stochastic control via LMIs , 2011 .

[66]  Zhong-Ping Jiang,et al.  Decentralized Adaptive Optimal Control of Large-Scale Systems With Application to Power Systems , 2015, IEEE Transactions on Industrial Electronics.

[67]  Bo Lincoln,et al.  Relaxing dynamic programming , 2006, IEEE Transactions on Automatic Control.

[68]  Warren B. Powell,et al.  Approximate Dynamic Programming - Solving the Curses of Dimensionality , 2007 .

[69]  F.L. Lewis,et al.  Reinforcement learning and adaptive dynamic programming for feedback control , 2009, IEEE Circuits and Systems Magazine.

[70]  Warren B. Powell,et al.  Handbook of Learning and Approximate Dynamic Programming , 2006, IEEE Transactions on Automatic Control.

[71]  Moritz Diehl,et al.  Discrete-time stochastic optimal control via occupation measures and moment relaxations , 2009, Proceedings of the 48h IEEE Conference on Decision and Control (CDC) held jointly with 2009 28th Chinese Control Conference.