Improving Generalization Capabilities of Dynamic Neural Networks

This work addresses the problem of improving the generalization capabilities of continuous recurrent neural networks. The learning task is transformed into an optimal control framework in which the weights and the initial network state are treated as unknown controls. A new learning algorithm based on a variational formulation of Pontrayagin's maximum principle is proposed. Under reasonable assumptions, its convergence is discussed. Numerical examples are given that demonstrate an essential improvement of generalization capabilities after the learning process of a dynamic network.

[1]  Mahesan Niranjan,et al.  On the Practical Applicability of VC Dimension Bounds , 1995, Neural Computation.

[2]  Volker Tresp,et al.  Combining Regularized Neural Networks , 1997, ICANN.

[3]  V. M. Alexandrov Approximate solution of optimal control problems , 1979 .

[4]  Keinosuke Fukunaga,et al.  Introduction to statistical pattern recognition (2nd ed.) , 1990 .

[5]  Eitan Michael Azoff,et al.  Reducing error in neural network time series forecasting , 1993, Neural Computing & Applications.

[6]  Y. Sakawa,et al.  On global convergence of an algorithm for optimal control , 1980 .

[7]  Shun-ichi Amari,et al.  Network information criterion-determining the number of hidden units for an artificial neural network model , 1994, IEEE Trans. Neural Networks.

[8]  N. V. Bhat,et al.  determining model structure for neural models by network stripping , 1992 .

[9]  Herbert Witte,et al.  Structure optimization of neural networks with the A*-algorithm , 1997, IEEE Trans. Neural Networks.

[10]  Barbara Hammer Generalization of Elman Networks , 1997, ICANN.

[11]  Keinosuke Fukunaga,et al.  Introduction to Statistical Pattern Recognition , 1972 .

[12]  Eric A. Wan,et al.  Combining fossil and sunspot data: committee predictions , 1997, Proceedings of International Conference on Neural Networks (ICNN'97).

[13]  Hava T. Siegelmann,et al.  The Dynamic Universality of Sigmoidal Neural Networks , 1996, Inf. Comput..

[14]  Emanuel Marom,et al.  Efficient Training of Recurrent Neural Network with Time Delays , 1997, Neural Networks.

[15]  Herbert Witte,et al.  Learning continuous trajectories in recurrent neural networks with time-dependent weights , 1999, IEEE Trans. Neural Networks.

[16]  Stephen A. Billings,et al.  Radial basis function network configuration using genetic algorithms , 1995, Neural Networks.

[17]  Herbert Witte,et al.  Training Continuous Trajectories by Means of Dynamic Neural Networks with Time Dependent Weights , 1998, NC.

[18]  Vladimir Vapnik,et al.  Chervonenkis: On the uniform convergence of relative frequencies of events to their probabilities , 1971 .

[19]  C. Atkinson METHODS FOR SOLVING INCORRECTLY POSED PROBLEMS , 1985 .

[20]  A. F. Filippov On Certain Questions in the Theory of Optimal Control , 1962 .

[21]  Chuanyi Ji,et al.  Fast training of recurrent networks based on the EM algorithm , 1998, IEEE Trans. Neural Networks.

[22]  Yann LeCun,et al.  Tangent Prop - A Formalism for Specifying Selected Invariances in an Adaptive Network , 1991, NIPS.

[23]  Visakan Kadirkamanathan,et al.  Adaptive Control of Nonlinear Systems , 2001 .

[24]  V. A. Srochko Use of maximum principle for numerical solution of optimal control problems with terminal constraints , 1986, Cybernetics.

[25]  Barak A. Pearlmutter Gradient calculations for dynamic recurrent neural networks: a survey , 1995, IEEE Trans. Neural Networks.

[26]  Carl E. Rasmussen,et al.  Pruning from Adaptive Regularization , 1994, Neural Computation.

[27]  Malur K. Sundareshan,et al.  A learning automaton approach to trajectory learning and control system design using dynamic recurrent neural networks , 1994, Proceedings of 1994 IEEE International Conference on Neural Networks (ICNN'94).

[28]  Tomaso A. Poggio,et al.  Regularization Theory and Neural Networks Architectures , 1995, Neural Computation.

[29]  Christopher M. Bishop,et al.  Current address: Microsoft Research, , 2022 .

[30]  Chris Bishop,et al.  Improving the Generalization Properties of Radial Basis Function Neural Networks , 1991, Neural Computation.

[31]  Heekuck Oh,et al.  Neural Networks for Pattern Recognition , 1993, Adv. Comput..

[32]  Elie Bienenstock,et al.  Neural Networks and the Bias/Variance Dilemma , 1992, Neural Computation.

[33]  Eduardo D. Sontag,et al.  Sample complexity for learning recurrent perceptron mappings , 1995, IEEE Trans. Inf. Theory.

[34]  A. N. Tikhonov,et al.  Solutions of ill-posed problems , 1977 .

[35]  D. V. Sridhar,et al.  Information theoretic subset selection for neural network models , 1998 .

[36]  Thomas Czernichow A Double Gradient Algorithm to Optimize Regularization , 1997, ICANN.

[37]  C. Lee Giles,et al.  Constructive learning of recurrent neural networks: limitations of recurrent cascade correlation and a simple solution , 1995, IEEE Trans. Neural Networks.

[38]  Geoffrey E. Hinton Learning Translation Invariant Recognition in Massively Parallel Networks , 1987, PARLE.

[39]  Stig Strand,et al.  A COMPARISON OF CONSTRAINED OPTIMAL CONTROL ALGORITHMS , 1990 .

[40]  Herbert Witte,et al.  Improved learning of multiple continuous trajectories with initial network state , 2000, Proceedings of the IEEE-INNS-ENNS International Joint Conference on Neural Networks. IJCNN 2000. Neural Computing: New Challenges and Perspectives for the New Millennium.

[41]  N. V. Bhat,et al.  Use of neural nets for dynamic modeling and control of chemical process systems , 1990 .

[42]  Yaser S. Abu-Mostafa,et al.  The Vapnik-Chervonenkis Dimension: Information versus Complexity in Learning , 1989, Neural Computation.

[43]  Andrew Chi-Sing Leung,et al.  Two regularizers for recursive least squared algorithms in feedforward multilayered neural networks , 2001, IEEE Trans. Neural Networks.

[44]  Scott E. Fahlman,et al.  The Recurrent Cascade-Correlation Architecture , 1990, NIPS.

[45]  T. A. Condarcure,et al.  Recurrent neural-network training by a learning automaton approach for trajectory learning and control system design , 1998, IEEE Trans. Neural Networks.

[46]  Lizhong Wu,et al.  A Smoothing Regularizer for Feedforward and Recurrent Neural Networks , 1996, Neural Computation.

[47]  Miroslaw Galicki,et al.  The Planning of Robotic Optimal Motions in the Presence of Obstacles , 1998, Int. J. Robotics Res..