Adaptive Regularization in Neural Network Modeling

In this paper we address the important problem of optimizing regularization parameters in neural network modeling. The suggested optimization scheme is an extended version of the recently presented algorithm [24]. The idea is to minimize an empirical estimate { like the cross-validation estimate { of the generalization error with respect to regularization parameters. This is done by employing a simple iterative gradient descent scheme using virtually no additional programming overhead compared to standard training. Experiments with feed-forward neural network models for time series prediction and classi cation tasks showed the viability and robustness of the algorithm. Moreover, we provided some simple theoretical examples in order to illustrate the potential and limitations of the proposed regularization framework.

[1]  Elie Bienenstock,et al.  Neural Networks and the Bias/Variance Dilemma , 1992, Neural Computation.

[2]  Seymour Geisser,et al.  The Predictive Sample Reuse Method with Applications , 1975 .

[3]  Lizhong Wu,et al.  A Smoothing Regularizer for Feedforward and Recurrent Neural Networks , 1996, Neural Computation.

[4]  John E. Dennis,et al.  Numerical methods for unconstrained optimization and nonlinear equations , 1983, Prentice Hall series in computational mathematics.

[5]  L.K. Hansen,et al.  Adaptive regularization of neural classifiers , 1997, Neural Networks for Signal Processing VII. Proceedings of the 1997 IEEE Signal Processing Society Workshop.

[6]  Klaus-Robert Müller,et al.  Asymptotic statistical theory of overtraining and cross-validation , 1997, IEEE Trans. Neural Networks.

[7]  Yann LeCun,et al.  Optimal Brain Damage , 1989, NIPS.

[8]  Michael Kearns,et al.  A Bound on the Error of Cross Validation Using the Approximation and Estimation Rates, with Consequences for the Training-Test Split , 1995, Neural Computation.

[9]  J. Larsen,et al.  Design and regularization of neural networks: the optimal use of a validation set , 1996, Neural Networks for Signal Processing VI. Proceedings of the 1996 IEEE Signal Processing Society Workshop.

[10]  Jan Larsen,et al.  A generalization error estimate for nonlinear systems , 1992, Neural Networks for Signal Processing II Proceedings of the 1992 IEEE Workshop.

[11]  M. W. Pedersen,et al.  Training recurrent networks , 1997, Neural Networks for Signal Processing VII. Proceedings of the 1997 IEEE Signal Processing Society Workshop.

[12]  Cyril Goutte,et al.  Note on Free Lunches and Cross-Validation , 1997, Neural Computation.

[13]  Y. Le Cun,et al.  Improving generalization performance in character recognition , 1991, Neural Networks for Signal Processing Proceedings of the 1991 IEEE Workshop.

[14]  Kurt Hornik,et al.  Approximation capabilities of multilayer feedforward networks , 1991, Neural Networks.

[15]  Jan Larsen,et al.  Adaptive regularization of neural networks using conjugate gradient , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[16]  Anders Krogh,et al.  Introduction to the theory of neural computation , 1994, The advanced book program.

[17]  M. Niranjan,et al.  A Dynamic Neural Network Architecture by Sequential Partitioning of the Input Space , 1994, Neural Computation.

[18]  John E. Moody,et al.  Smoothing Regularizers for Projective Basis Function Networks , 1996, NIPS.

[19]  J. Larsen,et al.  Design and evaluation of neural classifiers , 1996, Neural Networks for Signal Processing VI. Proceedings of the 1996 IEEE Signal Processing Society Workshop.

[20]  Geoffrey E. Hinton,et al.  Simplifying Neural Networks by Soft Weight-Sharing , 1992, Neural Computation.

[21]  D. Lowe,et al.  Adaptive radial basis function nonlinearities, and the problem of generalisation , 1989 .

[22]  M. Stone Cross‐Validatory Choice and Assessment of Statistical Predictions , 1976 .

[23]  Lars Kai Hansen,et al.  Generalization performance of regularized neural network models , 1994, Proceedings of IEEE Workshop on Neural Networks for Signal Processing.

[24]  Lars Kai Hansen,et al.  Designer networks for time series processing , 1993, Neural Networks for Signal Processing III - Proceedings of the 1993 IEEE-SP Workshop.

[25]  Heekuck Oh,et al.  Neural Networks for Pattern Recognition , 1993, Adv. Comput..

[26]  Christopher M. Bishop,et al.  Curvature-driven smoothing: a learning algorithm for feedforward networks , 1993, IEEE Trans. Neural Networks.

[27]  Lars Kai Hansen,et al.  Empirical generalization assessment of neural network models , 1995, Proceedings of 1995 IEEE Workshop on Neural Networks for Signal Processing.

[28]  Peter M. Williams,et al.  Bayesian Regularization and Pruning Using a Laplace Prior , 1995, Neural Computation.

[29]  Jan Larsen,et al.  DESIGN OF NEURAL NETWORK FILTERS , 1996 .

[30]  David E. Rumelhart,et al.  Predicting the Future: a Connectionist Approach , 1990, Int. J. Neural Syst..

[31]  Huaiyu Zhu,et al.  No Free Lunch for Cross-Validation , 1996, Neural Computation.

[32]  L. K. Hansen,et al.  Adaptive regularization , 1994, Proceedings of IEEE Workshop on Neural Networks for Signal Processing.

[33]  David H. Wolpert,et al.  The Mathematics of Search , 1996 .

[34]  Carl E. Rasmussen,et al.  Pruning from Adaptive Regularization , 1994, Neural Computation.

[35]  D. Mackay,et al.  A Practical Bayesian Framework for Backprop Networks , 1991 .

[36]  G. E. Peterson,et al.  Control Methods Used in a Study of the Vowels , 1951 .

[37]  Lars Kai Hansen,et al.  Linear unlearning for cross-validation , 1996, Adv. Comput. Math..

[38]  H. Akaike Fitting autoregressive models for prediction , 1969 .

[39]  S. Amari,et al.  Network Information Criterion | Determining the Number of Hidden Units for an Articial Neural Network Model Network Information Criterion | Determining the Number of Hidden Units for an Articial Neural Network Model , 2007 .

[40]  Raymond L. Watrous Current status of Peterson-Barney vowel formant data. , 1991, The Journal of the Acoustical Society of America.

[41]  Lars Kai Hansen,et al.  Regularization with a Pruning Prior , 1997, Neural Networks.

[42]  John Moody,et al.  Prediction Risk and Architecture Selection for Neural Networks , 1994 .

[43]  J. Sjöberg Non-Linear System Identification with Neural Networks , 1995 .