Optimal Control for Unknown Discrete-Time Nonlinear Markov Jump Systems Using Adaptive Dynamic Programming

In this paper, we develop and analyze an optimal control method for a class of discrete-time nonlinear Markov jump systems (MJSs) with unknown system dynamics. Specifically, an identifier is established for the unknown systems to approximate system states, and an optimal control approach for nonlinear MJSs is developed to solve the Hamilton-Jacobi-Bellman equation based on the adaptive dynamic programming technique. We also develop detailed stability analysis of the control approach, including the convergence of the performance index function for nonlinear MJSs and the existence of the corresponding admissible control. Neural network techniques are used to approximate the proposed performance index function and the control law. To demonstrate the effectiveness of our approach, three simulation studies, one linear case, one nonlinear case, and one single link robot arm case, are used to validate the performance of the proposed optimal control method.

[1]  Feng Lin,et al.  Robust Control of Nonlinear Systems: Compensating for Uncertainty , 1990, 1990 American Control Conference.

[2]  Rainer Palm,et al.  Fuzzy switched hybrid systems-modeling and identification , 1998, Proceedings of the 1998 IEEE International Symposium on Intelligent Control (ISIC) held jointly with IEEE International Symposium on Computational Intelligence in Robotics and Automation (CIRA) Intell.

[3]  H. Alzer On the Cauchy-Schwarz Inequality☆ , 1999 .

[4]  Feng-Yi Lin,et al.  An optimal control approach to robust control design , 2000 .

[5]  Benjamin Van Roy,et al.  Approximate Dynamic Programming via Linear Programming , 2001, NIPS.

[6]  Abdellah Benzaouia,et al.  Stability of discrete-time linear systems with Markovian jumping parameters and constrained control , 2002, IEEE Trans. Autom. Control..

[7]  Jamal Daafouz,et al.  Stability analysis and control synthesis for switched systems: a switched Lyapunov function approach , 2002, IEEE Trans. Autom. Control..

[8]  Yuguang Fang,et al.  Stochastic stability of jump linear systems , 2002, IEEE Trans. Autom. Control..

[9]  Zhi-Hong Guan,et al.  Guaranteed cost control for uncertain Markovian jump systems with mode-dependent time-delays , 2003, IEEE Trans. Autom. Control..

[10]  Benjamin Van Roy,et al.  The Linear Programming Approach to Approximate Dynamic Programming , 2003, Oper. Res..

[11]  M. Mahmoud,et al.  Robust Kalman filtering for discrete-time Markovian jump systems with parameter uncertainty , 2004 .

[12]  H.R. Pota,et al.  Decentralized control of power systems via robust control of uncertain Markov jump parameter systems , 2005, 2004 43rd IEEE Conference on Decision and Control (CDC) (IEEE Cat. No.04CH37601).

[13]  Yi Zhang,et al.  A self-learning call admission control scheme for CDMA cellular networks , 2005, IEEE Transactions on Neural Networks.

[14]  K. Cai,et al.  Mode-independent robust stabilization for uncertain Markovian jump nonlinear systems via fuzzy control , 2005, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[15]  Frank L. Lewis,et al.  Nearly optimal control laws for nonlinear systems with saturating actuators using a neural network HJB approach , 2005, Autom..

[16]  Warren B. Powell,et al.  Handbook of Learning and Approximate Dynamic Programming , 2006, IEEE Transactions on Automatic Control.

[17]  P.J. Werbos,et al.  Using ADP to Understand and Replicate Brain Intelligence: the Next Level Design , 2007, 2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning.

[18]  Derong Liu,et al.  Adaptive Critic Learning Techniques for Engine Torque and Air–Fuel Ratio Control , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[19]  John S. Baras,et al.  Optimal state estimation for discrete-time Markovian Jump Linear Systems, in the presence of delayed output observations , 2008, ITW.

[20]  Huaguang Zhang,et al.  A Novel Infinite-Time Optimal Tracking Control Scheme for a Class of Discrete-Time Nonlinear Systems via the Greedy HDP Iteration Algorithm , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[21]  Frank L. Lewis,et al.  Discrete-Time Nonlinear HJB Solution Using Approximate Dynamic Programming: Convergence Proof , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[22]  Lixian Zhang,et al.  Stability and stabilization of Markovian jump linear systems with partly unknown transition probabilities , 2009, Autom..

[23]  M. Gopal,et al.  Fixed final time optimal control approach for bounded robust controller design using Hamilton-Jacobi-Bellman solution , 2009 .

[24]  Paul J. Werbos,et al.  2009 Special Issue: Intelligence in the brain: A theory of how it works and how to build it , 2009 .

[25]  Frank L. Lewis,et al.  Online actor critic algorithm to solve the continuous-time infinite horizon optimal control problem , 2009, 2009 International Joint Conference on Neural Networks.

[26]  Fei Liu,et al.  Neural‐network‐based finite‐time H∞ control for extended Markov jump nonlinear systems , 2009 .

[27]  Huaguang Zhang,et al.  Neural-Network-Based Near-Optimal Control for a Class of Discrete-Time Affine Nonlinear Systems With Control Constraints , 2009, IEEE Transactions on Neural Networks.

[28]  J.C. Geromel,et al.  ${\cal H}_{\infty}$ Filtering of Discrete-Time Markov Jump Linear Systems Through Linear Matrix Inequalities , 2009, IEEE Transactions on Automatic Control.

[29]  Seungchul Lee,et al.  Maintenance Strategies for Manufacturing Systems using Markov Models , 2010 .

[30]  Haibo He,et al.  Adaptive Learning and Control for MIMO System Based on Adaptive Dynamic Programming , 2011, IEEE Transactions on Neural Networks.

[31]  Haibo He Self-Adaptive Systems for Machine Intelligence: He/Machine Intelligence , 2011 .

[32]  Derong Liu,et al.  A neural-network-based iterative GDHP approach for solving a class of nonlinear optimal control problems with control constraints , 2011, Neural Computing and Applications.

[33]  Haibo He Self-Adaptive Systems for Machine Intelligence , 2011 .

[34]  Frank L. Lewis,et al.  Reinforcement Learning for Partially Observable Dynamic Processes: Adaptive Dynamic Programming Using Measured Output Data , 2011, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[35]  Huaguang Zhang,et al.  An iterative adaptive dynamic programming method for solving a class of nonlinear zero-sum differential games , 2011, Autom..

[36]  Jarkko Isotalo,et al.  The Cauchy–Schwarz Inequality , 2011 .

[37]  Qiuye Sun,et al.  Adaptive dynamic programming-based optimal control of unknown nonaffine nonlinear discrete-time systems with proof of convergence , 2012, Neurocomputing.

[38]  Haibo He,et al.  Reinforcement learning control based on multi-goal representation using hierarchical heuristic dynamic programming , 2012, The 2012 International Joint Conference on Neural Networks (IJCNN).

[39]  Derong Liu,et al.  Optimal control for discrete-time affine non-linear systems using general value iteration , 2012 .

[40]  Feng Liu,et al.  A boundedness result for the direct heuristic dynamic programming , 2012, Neural Networks.

[41]  Derong Liu,et al.  Adaptive dynamic programming with stable value iteration algorithm for discrete-time nonlinear systems , 2012, The 2012 International Joint Conference on Neural Networks (IJCNN).

[42]  Qinglai Wei,et al.  Optimal control of unknown nonaffine nonlinear discrete-time systems based on adaptive dynamic programming , 2012, Autom..

[43]  Haibo He,et al.  A three-network architecture for on-line learning and optimization based on adaptive dynamic programming , 2012, Neurocomputing.

[44]  Derong Liu,et al.  Finite-horizon neuro-optimal tracking control for a class of discrete-time nonlinear systems using adaptive dynamic programming approach , 2012, Neurocomputing.

[45]  Frank L. Lewis,et al.  Reinforcement Learning and Approximate Dynamic Programming for Feedback Control , 2012 .

[46]  Derong Liu,et al.  Finite-Approximation-Error-Based Optimal Control Approach for Discrete-Time Nonlinear Systems , 2013, IEEE Transactions on Cybernetics.

[47]  Derong Liu,et al.  Neural-network-based zero-sum game for discrete-time nonlinear systems via iterative adaptive dynamic programming algorithm , 2013, Neurocomputing.

[48]  Frank L. Lewis,et al.  Learning and Optimization in Hierarchical Adaptive Critic Design , 2013 .

[49]  Haibo He,et al.  Heuristic dynamic programming with internal goal representation , 2013, Soft Comput..

[50]  Haibo He,et al.  Robust controller design of continuous-time nonlinear system using neural network , 2013, The 2013 International Joint Conference on Neural Networks (IJCNN).

[51]  Derong Liu,et al.  Numerical adaptive learning control scheme for discrete-time non-linear systems , 2013 .

[52]  Haibo He,et al.  Goal Representation Heuristic Dynamic Programming on Maze Navigation , 2013, IEEE Transactions on Neural Networks and Learning Systems.

[53]  Derong Liu,et al.  An iterative adaptive dynamic programming algorithm for optimal control of unknown discrete-time nonlinear systems with constrained inputs , 2013, Inf. Sci..

[54]  Jinyu Wen,et al.  Adaptive Learning in Tracking Control Based on the Dual Critic Network Design , 2013, IEEE Transactions on Neural Networks and Learning Systems.

[55]  Zhen Ni,et al.  Experimental Studies on Data-Driven Heuristic Dynamic Programming for POMDP , 2014 .

[56]  Indra Narayan Kar,et al.  Bounded robust control of nonlinear systems using neural network–based HJB solution , 2011, IEEE Transactions on Automation Science and Engineering.