Control of unknown nonlinear systems with efficient transient performance using concurrent exploitation and exploration

Learning mechanisms that operate in unknown environments should be able to efficiently deal with the problem of controlling unknown dynamical systems. Many approaches that deal with such a problem face the so-called exploitationexploration dilemma where the controller has to sacrifice efficient performance for the sake of learning “better” control strategies than the ones already known. In this paper we show that, in the case where the control goal is to stabilize an unknown dynamical system by means of state feedback, exploitation and exploration can be concurrently performed. This is made possible through an appropriate combination of recent results developed by the author in the areas of adaptive control and adaptive optimization and a new result on the convex construction of Control Lyapunov Functions (CLF) for nonlinear systems. The resulting scheme guarantees arbitrarily good performance outside the regions where the system is uncontrollable. Theoretical analysis verify such a claim.

[1]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .

[2]  Angela J. Yu,et al.  Uncertainty, Neuromodulation, and Attention , 2005, Neuron.

[3]  John Tsinias,et al.  Sufficient lyapunov-like conditions for stabilization , 1989, Math. Control. Signals Syst..

[4]  J. Cohen,et al.  The role of locus coeruleus in the regulation of cognitive performance. , 1999, Science.

[5]  Hans-Peter Seidel,et al.  Local compliance estimation via positive semidefinite constrained least squares , 2004, IEEE Transactions on Robotics.

[6]  M. Papageorgiou,et al.  An Efficient Adaptive Optimization Scheme , 2008 .

[7]  Eduardo Sontag A universal construction of Artstein's theorem on nonlinear stabilization , 1989 .

[8]  Elias B. Kosmatopoulos CLF-based control design with good transient performance for known and unknown multi-input nonlinear systems , 2009, 2009 European Control Conference (ECC).

[9]  George A. Rovithakis,et al.  An adaptive neuro-fuzzy tracking control for multi-input nonlinear dynamic systems , 2008, Autom..

[10]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[11]  John N. Tsitsiklis,et al.  Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[12]  B. Barmish,et al.  Adaptive stabilization of linear systems via switching control , 1986, 1986 25th IEEE Conference on Decision and Control.

[13]  Ron Meir,et al.  Approximation bounds for smooth functions in C(Rd) by neural and mixture networks , 1998, IEEE Trans. Neural Networks.

[14]  Marios M. Polycarpou,et al.  High-order neural network structures for identification of dynamical systems , 1995, IEEE Trans. Neural Networks.

[15]  Tarek Hamel,et al.  Image based visual servo control for a class of aerial robotic systems , 2007, Autom..

[16]  Paul J. Werbos,et al.  Approximate dynamic programming for real-time control and neural modeling , 1992 .

[17]  Wen Yu,et al.  Nonlinear system identification with recurrent neural networks and dead-zone Kalman filter algorithm , 2007, Neurocomputing.

[18]  Junichiro Yoshimoto,et al.  Control of exploitation-exploration meta-parameter in reinforcement learning , 2002, Neural Networks.

[19]  Elias B. Kosmatopoulos,et al.  Robust switching adaptive control of multi-input nonlinear systems , 2002, IEEE Trans. Autom. Control..

[20]  Christian Berg,et al.  Positive definite functions on Abelian semigroups , 1976 .

[21]  Elias B. Kosmatopoulos,et al.  Large Scale Nonlinear Control System Fine-Tuning Through Learning , 2009, IEEE Transactions on Neural Networks.

[22]  E. B. Kosmotapoulos An adaptive optimization scheme with satisfactory transient performance. , 2009 .

[23]  Stephen J. Wright,et al.  Numerical Optimization , 2018, Fundamental Statistical Inference.

[24]  J. Lasserre A Sum of Squares Approximation of Nonnegative Polynomials , 2004, SIAM Journal on Optimization.

[25]  Angela J. Yu,et al.  Should I stay or should I go? How the human brain manages the trade-off between exploitation and exploration , 2007, Philosophical Transactions of the Royal Society B: Biological Sciences.

[26]  Elias B. Kosmatopoulos,et al.  CLF-Based Control Design for Unknown Multiinput Nonlinear Systems With Good Transient Performance , 2010, IEEE Transactions on Automatic Control.

[27]  P. Dayan,et al.  Cortical substrates for exploratory decisions in humans , 2006, Nature.

[28]  Richard S. Sutton,et al.  A Menu of Designs for Reinforcement Learning Over Time , 1995 .

[29]  Elias B. Kosmatopoulos,et al.  Adaptive Control Design Based on Adaptive Optimization Principles , 2008, IEEE Transactions on Automatic Control.