Global optimality of approximate dynamic programming and its use in non-convex function minimization

Level curves of the Rosenbrock function subject to minimization and state trajectories for different initial conditions x0ź{-2, -1, 0, 1, 2}×{-2, -1, 0, 1, 2}. The red plus signs denote the initial point of the respective trajectory. This study investigates the global optimality of approximate dynamic programming (ADP) based solutions using neural networks for optimal control problems with fixed final time. Issues including whether or not the cost function terms and the system dynamics need to be convex functions with respect to their respective inputs are discussed and sufficient conditions for global optimality of the result are derived. Next, a new idea is presented to use ADP with neural networks for optimization of non-convex smooth functions. It is shown that any initial guess leads to direct movement toward the proximity of the global optimum of the function. This behavior is in contrast with gradient based optimization methods in which the movement is guided by the shape of the local level curves. Illustrative examples are provided with single and multi-variable functions that demonstrate the potential of the proposed method.

[1]  Paul M. Goldbart,et al.  Mathematics for Physics: A Guided Tour for Graduate Students , 2009 .

[2]  Robert F. Stengel,et al.  Online Adaptive Critic Flight Control , 2004 .

[3]  Ali Heydari,et al.  Fixed-final-time optimal tracking control of input-affine nonlinear systems , 2014, Neurocomputing.

[4]  Dimitri P. Bertsekas,et al.  Nonlinear Programming , 1997 .

[5]  Frank L. Lewis,et al.  Discrete-Time Nonlinear HJB Solution Using Approximate Dynamic Programming: Convergence Proof , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[6]  S. N. Balakrishnan,et al.  Adaptive-critic based neural networks for aircraft optimal control , 1996 .

[7]  Benoît Chachuat,et al.  Nonlinear and Dynamic Optimization: From Theory to Practice , 2007 .

[8]  P. Olver Nonlinear Systems , 2013 .

[9]  Derong Liu,et al.  An iterative ϵ-optimal control scheme for a class of discrete-time nonlinear systems with unfixed initial state , 2012, Neural Networks.

[10]  R. Abraham,et al.  Manifolds, Tensor Analysis, and Applications , 1983 .

[11]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[12]  Kurt Hornik,et al.  Multilayer feedforward networks are universal approximators , 1989, Neural Networks.

[13]  S. N. Balakrishnan,et al.  An Online Nonlinear Optimal Controller Synthesis for Aircraft with Model Uncertainties , 2010 .

[14]  Gilles Pagès,et al.  Approximations of Functions by a Multilayer Perceptron: a New Approach , 1997, Neural Networks.

[15]  Radhakant Padhi,et al.  A single network adaptive critic (SNAC) architecture for optimal control synthesis for a class of nonlinear systems , 2006, Neural Networks.

[16]  Ali Heydari,et al.  Finite-Horizon Control-Constrained Nonlinear Optimal Control Using Single Network Adaptive Critics , 2013, IEEE Transactions on Neural Networks and Learning Systems.

[17]  Chi-Tsong Chen,et al.  Linear System Theory and Design , 1995 .

[18]  Ali Heydari,et al.  Fixed-final-time optimal control of nonlinear systems with terminal constraints , 2013, Neural Networks.

[19]  H. H. Rosenbrock,et al.  An Automatic Method for Finding the Greatest or Least Value of a Function , 1960, Comput. J..

[20]  Donald E. Kirk,et al.  Optimal control theory : an introduction , 1970 .

[21]  W. F. Trench,et al.  Introduction to Real Analysis: An Educational Approach , 2009 .

[22]  Rajkumar Roy,et al.  Evolutionary-based techniques for real-life optimisation: development and testing , 2002, Appl. Soft Comput..

[23]  J. Magnus,et al.  Matrix Differential Calculus with Applications in Statistics and Econometrics , 1991 .

[24]  D. Liu,et al.  Adaptive Dynamic Programming for Finite-Horizon Optimal Control of Discrete-Time Nonlinear Systems With $\varepsilon$-Error Bound , 2011, IEEE Transactions on Neural Networks.

[25]  Sarangapani Jagannathan,et al.  Optimal control of unknown affine nonlinear discrete-time systems using offline-trained neural networks with proof of convergence , 2009, Neural Networks.

[26]  Paul J. Werbos,et al.  Approximate dynamic programming for real-time control and neural modeling , 1992 .

[27]  Frank L. Lewis,et al.  Reinforcement Learning and Approximate Dynamic Programming for Feedback Control , 2012 .

[28]  George G. Lendaris,et al.  A New Hybrid Critic-Training Method for Approximate Dynamic Programming , 2000 .

[29]  S. N. Balakrishnan,et al.  State-constrained agile missile control with adaptive-critic-based neural networks , 2002, IEEE Trans. Control. Syst. Technol..

[30]  S. Siva Sathya,et al.  Convergence of nomadic genetic algorithm on benchmark mathematical functions , 2013, Appl. Soft Comput..

[31]  Huaguang Zhang,et al.  The finite-horizon optimal control for a class of time-delay affine nonlinear system , 2011, Neural Computing and Applications.