Incremental Deep Neural Network Pruning Based on Hessian Approximation

In this paper, based on the Hessian approximation, an incremental pruning method is proposed to compress the deep neural network. The proposed method starts from the idea of using the Hessian to measure the "importance" of each weight in a deep neural network, and it mainly has the following key contributions. First, we propose to use the second moment in Adam optimizer as a measure of the "importance" of each weight to avoid calculating the Hessian matrix. Second, an incremental method is proposed to prune the neural network step by step. The incremental method can adjust the remaining non-zero weights of the whole network after each pruning to help boost the performance of the pruned network. Last but not least, the proposed method applies an automatically-generated global threshold for all the weights among all the layers, which achieves the inter-layer bit allocation automatically. Such a method can improve performance and save the complexity of adjusting the pruning threshold layer by layer. We perform a number of experiments on MNIST and ImageNet using commonly used neural networks such as AlexNet and VGG16 to show the benefits of the proposed algorithm. The experimental results show that the proposed algorithm is able to compress the network significantly with almost no loss of accuracy, which demonstrates the effectiveness of the proposed algorithm.