Adaptive regularization

Regularization, e.g., in the form of weight decay, is important for training and optimization of neural network architectures. In this work the authors provide a tool based on asymptotic sampling theory, for iterative estimation of weight decay parameters. The basic idea is to do a gradient descent in the estimated generalization error with respect to the regularization parameters. The scheme is implemented in the authors' Designer Net framework for network training and pruning, i.e., is based on the diagonal Hessian approximation. The scheme does not require essential computational overhead in addition to what is needed for training and pruning. The viability of the approach is demonstrated in an experiment concerning prediction of the chaotic Mackey-Glass series. The authors find that the optimized weight decays are relatively large for densely connected networks in the initial pruning phase, while they decrease as pruning proceeds.<<ETX>>

[1]  H. Akaike Fitting autoregressive models for prediction , 1969 .

[2]  M. V. Rossum,et al.  In Neural Computation , 2022 .

[3]  Yann LeCun,et al.  Optimal Brain Damage , 1989, NIPS.

[4]  Michael I. Jordan,et al.  Advances in Neural Information Processing Systems 30 , 1995 .

[5]  Anders Krogh,et al.  Introduction to the theory of neural computation , 1994, The advanced book program.

[6]  John Moody,et al.  Note on generalization, regularization and architecture selection in nonlinear learning systems , 1991, Neural Networks for Signal Processing Proceedings of the 1991 IEEE Workshop.

[7]  David J. C. MacKay,et al.  Bayesian Interpolation , 1992, Neural Computation.

[8]  David J. C. MacKay,et al.  A Practical Bayesian Framework for Backpropagation Networks , 1992, Neural Computation.

[9]  Lars Kai Hansen,et al.  Designer networks for time series processing , 1993, Neural Networks for Signal Processing III - Proceedings of the 1993 IEEE-SP Workshop.

[10]  Lars Kai Hansen,et al.  On design and evaluation of tapped-delay neural network architectures , 1993, IEEE International Conference on Neural Networks.

[11]  Lars Kai Hansen,et al.  Stochastic linear learning: Exact test and training error averages , 1993, Neural Networks.

[12]  Carl E. Rasmussen,et al.  Pruning from Adaptive Regularization , 1994, Neural Computation.

[13]  Lars Kai Hansen,et al.  Generalization performance of regularized neural network models , 1994, Proceedings of IEEE Workshop on Neural Networks for Signal Processing.

[14]  Jan Larsen,et al.  DESIGN OF NEURAL NETWORK FILTERS , 1996 .