A neural-network-based methodology for the design of optimal controllers for nonlinear systems is presented. The overall architecture consists of two neural networks. The first neural network is a cost-to-go function approximator (CTGA), which is trained to predict the cost to go from the present state of the system. The second neural network converges to an optimal controller as it is trained to minimize the output of the first network. The CTGA can be trained using available simulation or experimental data. Hence an explicit analytical model of the system is not required. The key to the success of the approach is giving the CTGA a special decentralized structure that makes its training relatively straightforward and its prediction quality carefully controlled. The specific structure eliminates many of the uncertainties often involved in using artificial neural networks for this type of application. Validity of the approach is illustrated for the optimal attitude control of a spacecraft with reaction wheels. I. Introduction A RTIFICIAL neural networks have been investigated extensively in the optimal control of nonlinear systems. Control architectures known as adaptive critic designs (ACD) have been proposed for the optimal control problem. 1−3 ACD are based on the forward dynamic programming approach to optimization. The basic architecture consists of a critic that models the cost-to-go function and a controller. Both structures are parameterized using neural networks. These two functions are trained simultaneously and each of them depends on the fidelity of the other to get trained properly. This could make the training of the overall system particularly challenging. An interesting solution approach has been presented recently, where the critic and the controller are pretrained using linear models over the region of operation of the system and an algebraic training procedure is employed in the initialization of the neural networks. 4 In our previous work, a modified parametric optimization approach was developed to generate optimal controllers in both state feedback form and dynamic output feedback form for linear systems. 5 In the present work, we generalize the approach to nonlinear systems. Parametric optimization imposes the form of the controller in advance and the controller parameters to optimize the performance measure are found. If the controller parameterization is done via neural networks, then given their universal functionapproximating capability, the true optimal controller can in theory be captured. 6,7 One objective of our research is to remove many of the uncertainties associated with training a neural network architecture that results in an optimal controller. The key to the success of our method is to give the neural network a very special structure that permits tight control over its prediction quality during training. The special structure also makes it possible for each subsystem of the overall network to be trained independent of other subsystems. The issue of interdependency among various portions of the overall network during training, encountered in the basic adaptive
[1]
Martin A. Riedmiller,et al.
A direct adaptive method for faster backpropagation learning: the RPROP algorithm
,
1993,
IEEE International Conference on Neural Networks.
[2]
E. Mosca.
Optimal, Predictive and Adaptive Control
,
1994
.
[3]
George Cybenko,et al.
Approximation by superpositions of a sigmoidal function
,
1992,
Math. Control. Signals Syst..
[4]
S. N. Balakrishnan,et al.
Adaptive-critic based neural networks for aircraft optimal control
,
1996
.
[5]
Bong Wie,et al.
Space Vehicle Dynamics and Control
,
1998
.
[6]
Niels Kjølstad Poulsen,et al.
Neural Networks for Modelling and Control of Dynamic Systems: A Practitioner’s Handbook
,
2000
.
[7]
Anuradha M. Annaswamy,et al.
Stable Adaptive Systems
,
1989
.
[8]
Kurt Hornik,et al.
Multilayer feedforward networks are universal approximators
,
1989,
Neural Networks.
[9]
Minh Q. Phan,et al.
Data-Based Cost-To-Go Design for Optimal Control
,
2002
.
[10]
Robert F. Stengel,et al.
An adaptive critic global controller
,
2002,
Proceedings of the 2002 American Control Conference (IEEE Cat. No.CH37301).
[11]
Graham C. Goodwin,et al.
Adaptive filtering prediction and control
,
1984
.
[12]
Mohammad Bagher Menhaj,et al.
Training feedforward networks with the Marquardt algorithm
,
1994,
IEEE Trans. Neural Networks.
[13]
Timothy Masters,et al.
Multilayer Feedforward Networks
,
1993
.
[14]
R. Offereins.
Book review: Digital control system analysis and design
,
1985
.