A pruning method for the recursive least squared algorithm

The recursive least squared (RLS) algorithm is an effective online training method for neural networks. However, its conjunctions with weight decay and pruning have not been well studied. This paper elucidates how generalization ability can be improved by selecting an appropriate initial value of the error covariance matrix in the RLS algorithm. Moreover, how the pruning of neural networks can be benefited by using the final value of the error covariance matrix will also be investigated. Our study found that the RLS algorithm is implicitly a weight decay method, where the weight decay effect is controlled by the initial value of the error covariance matrix; and that the inverse of the error covariance matrix is approximately equal to the Hessian matrix of the network being trained. We propose that neural networks are first trained by the RLS algorithm and then some unimportant weights are removed based on the approximate Hessian matrix. Simulation results show that our approach is an effective training and pruning method for neural networks.

[1]  Yann LeCun,et al.  Optimal Brain Damage , 1989, NIPS.

[2]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[3]  Sheng Chen,et al.  Parallel recursive prediction error algorithm for training layered neural networks , 1990 .

[4]  Carl E. Rasmussen,et al.  Pruning from Adaptive Regularization , 1994, Neural Computation.

[5]  Christopher M. Bishop,et al.  Current address: Microsoft Research, , 2022 .

[6]  Peter L. Bartlett,et al.  The Sample Complexity of Pattern Classification with Neural Networks: The Size of the Weights is More Important than the Size of the Network , 1998, IEEE Trans. Inf. Theory.

[7]  Toshio Fukuda,et al.  Learning algorithms of layered neural networks via extended Kalman filters , 1991 .

[8]  Russell Reed,et al.  Pruning algorithms-a survey , 1993, IEEE Trans. Neural Networks.

[9]  Les E. Atlas,et al.  Recurrent neural networks and robust time series prediction , 1994, IEEE Trans. Neural Networks.

[10]  Hideaki Sakai,et al.  A real-time learning algorithm for a multilayered neural network based on the extended Kalman filter , 1992, IEEE Trans. Signal Process..

[11]  S. Haykin,et al.  Adaptive Filter Theory , 1986 .

[12]  Kwok-Wo Wong,et al.  Recursive least squares approach to combining principal and minor component analyses , 1998 .

[13]  Lee A. Feldkamp,et al.  Neurocontrol of nonlinear dynamical systems with Kalman filter trained recurrent networks , 1994, IEEE Trans. Neural Networks.

[14]  E. Mosca Optimal, Predictive and Adaptive Control , 1994 .

[15]  B. Anderson,et al.  Optimal Filtering , 1979, IEEE Transactions on Systems, Man, and Cybernetics.

[16]  Babak Hassibi,et al.  Second Order Derivatives for Network Pruning: Optimal Brain Surgeon , 1992, NIPS.

[17]  David J. C. MacKay,et al.  Bayesian Interpolation , 1992, Neural Computation.

[18]  Ronald J. Williams,et al.  Training recurrent networks using the extended Kalman filter , 1992, [Proceedings 1992] IJCNN International Joint Conference on Neural Networks.

[19]  Kwok-Wo Wong,et al.  Recursive algorithms for principal component extraction , 1997 .

[20]  Francesco Palmieri,et al.  Optimal filtering algorithms for fast learning in feedforward neural networks , 1992, Neural Networks.

[21]  Masumi Ishikawa,et al.  Structural learning with forgetting , 1996, Neural Networks.

[22]  John Moody,et al.  Note on generalization, regularization and architecture selection in nonlinear learning systems , 1991, Neural Networks for Signal Processing Proceedings of the 1991 IEEE Workshop.

[23]  Volker Tresp,et al.  Averaging Regularized Estimators , 1997, Neural Computation.

[24]  Mark J. L. Orr,et al.  Regularization in the Selection of Radial Basis Function Centers , 1995, Neural Computation.

[25]  Stephen A. Billings,et al.  Recurrent radial basis function networks for adaptive noise cancellation , 1995, Neural Networks.

[26]  Barak A. Pearlmutter Fast Exact Multiplication by the Hessian , 1994, Neural Computation.

[27]  Kumpati S. Narendra,et al.  Neural networks and dynamical systems , 1992, Int. J. Approx. Reason..

[28]  Nazif Tepedelenlioglu,et al.  A fast new algorithm for training feedforward neural networks , 1992, IEEE Trans. Signal Process..

[29]  David J. C. MacKay,et al.  A Practical Bayesian Framework for Backpropagation Networks , 1992, Neural Computation.

[30]  Ah Chung Tsoi,et al.  Recurrent neural networks: A constructive algorithm, and its properties , 1997, Neurocomputing.

[31]  Tommy W. S. Chow,et al.  A novel noise robust fourth-order cumulants cost function , 1997, Neurocomputing.

[32]  Vwani P. Roychowdhury,et al.  On self-organizing algorithms and networks for class-separability features , 1997, IEEE Trans. Neural Networks.