论文信息 - Second Order Derivatives for Network Pruning: Optimal Brain Surgeon

Second Order Derivatives for Network Pruning: Optimal Brain Surgeon

We investigate the use of information from all second order derivatives of the error function to perform network pruning (i.e., removing unimportant weights from a trained network) in order to improve generalization, simplify networks, reduce hardware or storage requirements, increase the speed of further training, and in some cases enable rule extraction. Our method, Optimal Brain Surgeon (OBS), is Significantly better than magnitude-based methods and Optimal Brain Damage [Le Cun, Denker and Solla, 1990], which often remove the wrong weights. OBS permits the pruning of more weights than other methods (for the same error on the training set), and thus yields better generalization on test data. Crucial to OBS is a recursion relation for calculating the inverse Hessian matrix H-1 from training data and structural information of the net. OBS permits a 90%, a 76%, and a 62% reduction in weights over backpropagation with weight decay on three benchmark MONK's problems [Thrun et al., 1991]. Of OBS, Optimal Brain Damage, and magnitude-based methods, only OBS deletes the correct weights from a trained XOR network in every case. Finally, whereas Sejnowski and Rosenberg [1987] used 18,000 weights in their NETtalk network, we used OBS to prune a network to just 1560 weights, yielding better generalization.

Babak Hassibi | David G. Stork | B. Hassibi | D. Stork

[1] J. Rissanen,et al. Modeling By Shortest Data Description* , 1978, Autom..

[2] Geoffrey E. Hinton,et al. Learning internal representations by error propagation , 1986 .

[3] Terrence J. Sejnowski,et al. Parallel Networks that Learn to Pronounce English Text , 1987, Complex Syst..

[4] Yann LeCun,et al. Optimal Brain Damage , 1989, NIPS.

[5] Jude W. Shavlik,et al. Interpretation of Artificial Neural Networks: Mapping Knowledge-Based Neural Networks into Rules , 1991, NIPS.

[6] Sun-Yuan Kung,et al. A Frobenius approximation reduction method (FARM) for determining optimal number of hidden units , 1991, IJCNN-91-Seattle International Joint Conference on Neural Networks.

[7] Anders Krogh,et al. Introduction to the theory of neural computation , 1994, The advanced book program.

[8] Gregory J. Wolff,et al. Optimal Brain Surgeon and general network pruning , 1993, IEEE International Conference on Neural Networks.