Neural Network Compression via Additive Combination of Reshaped, Low-Rank Matrices

In the last five years, neural network compression has become an important problem due to the increasing necessity of running complex networks on small devices. We consider a form of network compression that has not been explored before: an additive combination of reshaped low-rank matrices. That is, given the weights of a neural network, we constrain them as a sum of differently shaped low-rank matrices to reduce the network's size and inference demands. Computationally, this is a hard problem involving integer variables (ranks) and continuous variables (weights), as well as nonlinear loss and constraints. We formulate it as a model selection over the family of compressed models and give an optimization algorithm that efficiently handles the inherent combinatorial structure. This results in a “Learning-Compression” algorithm which alternates between a standard machine learning step and a step involving signal compression. We demonstrate the effectiveness of the proposed compression scheme and the corresponding algorithm on multiple networks and datasets.

[1]  D K Smith,et al.  Numerical Optimization , 2001, J. Oper. Res. Soc..

[2]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[3]  Miguel Á. Carreira-Perpiñán,et al.  Model compression as constrained optimization, with application to neural nets. Part I: general framework , 2017, ArXiv.

[4]  Miguel Á. Carreira-Perpiñán,et al.  "Learning-Compression" Algorithms for Neural Net Pruning , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[5]  Naiyan Wang,et al.  Data-Driven Sparse Structure Selection for Deep Neural Networks , 2017, ECCV.

[6]  Liujuan Cao,et al.  Towards Optimal Structured CNN Pruning via Generative Adversarial Learning , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[8]  Joan Bruna,et al.  Exploiting Linear Structure Within Convolutional Networks for Efficient Evaluation , 2014, NIPS.

[9]  Bingbing Ni,et al.  Variational Convolutional Neural Network Pruning , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Jian Cheng,et al.  Quantized Convolutional Neural Networks for Mobile Devices , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Andrew Zisserman,et al.  Speeding up Convolutional Neural Networks with Low Rank Expansions , 2014, BMVC.

[12]  Chong-Min Kyung,et al.  Efficient Neural Network Compression , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Miguel Á. Carreira-Perpiñán,et al.  Low-Rank Compression of Neural Nets: Learning the Rank of Each Layer , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Yerlan Idelbayev,et al.  A flexible, extensible software framework for model compression based on the LC algorithm , 2020, ArXiv.

[15]  Jungong Han,et al.  Approximated Oracle Filter Pruning for Destructive CNN Width Optimization , 2019, ICML.

[16]  Ping Liu,et al.  Filter Pruning via Geometric Median for Deep Convolutional Neural Networks Acceleration , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Jingyu Wang,et al.  OICSR: Out-In-Channel Sparsity Regularization for Compact Deep Neural Networks , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Miguel Á. Carreira-Perpiñán,et al.  Model compression as constrained optimization, with application to neural nets. Part II: quantization , 2017, ArXiv.

[19]  Rongrong Ji,et al.  HRank: Filter Pruning Using High-Rank Feature Map , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Yuhui Xu,et al.  Trained Rank Pruning for Efficient Deep Neural Networks , 2018, 2019 Fifth Workshop on Energy Efficient Machine Learning and Cognitive Computing - NeurIPS Edition (EMC2-NIPS).

[21]  Stephen J. Wright,et al.  Numerical Optimization (Springer Series in Operations Research and Financial Engineering) , 2000 .

[22]  Miguel Á. Carreira-Perpiñán,et al.  More General and Effective Model Compression via an Additive Combination of Compressions , 2021, ECML/PKDD.

[23]  Chong Li,et al.  Constrained Optimization Based Low-Rank Approximation of Deep Neural Networks , 2018, ECCV.

[24]  Jian Sun,et al.  Accelerating Very Deep Convolutional Networks for Classification and Detection , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[25]  Song Han,et al.  EIE: Efficient Inference Engine on Compressed Deep Neural Network , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[26]  Hanan Samet,et al.  Pruning Filters for Efficient ConvNets , 2016, ICLR.

[27]  Larry S. Davis,et al.  NISP: Pruning Networks Using Neuron Importance Score Propagation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.