Rank-adaptive spectral pruning of convolutional layers during training

The computing cost and memory demand of deep learning pipelines have grown fast in recent years and thus a variety of pruning techniques have been developed to reduce model parameters. The majority of these techniques focus on reducing inference costs by pruning the network after a pass of full training. A smaller number of methods address the reduction of training costs, mostly based on compressing the network via low-rank layer factorizations. Despite their efficiency for linear layers, these methods fail to effectively handle convolutional filters. In this work, we propose a low-parametric training method that factorizes the convolutions into tensor Tucker format and adaptively prunes the Tucker ranks of the convolutional kernel during training. Leveraging fundamental results from geometric integration theory of differential equations on tensor manifolds, we obtain a robust training algorithm that provably approximates the full baseline performance and guarantees loss descent. A variety of experiments against the full model and alternative low-rank baselines are implemented, showing that the proposed method drastically reduces the training costs, while achieving high performance, comparable to or better than the full baseline, and consistently outperforms competing low-rank approaches.

[1]  R. Zdunek,et al.  Compressing convolutional neural networks with hierarchical Tucker-2 decomposition , 2022, Appl. Soft Comput..

[2]  J. Clune,et al.  Video PreTraining (VPT): Learning to Act by Watching Unlabeled Online Videos , 2022, NeurIPS.

[3]  J. Kusch,et al.  Low-rank lottery tickets: finding efficient low-rank neural networks via matrix differential equations , 2022, NeurIPS.

[4]  C. Lubich,et al.  Rank-adaptive time integration of tree tensor networks , 2022, SIAM Journal on Numerical Analysis.

[5]  Mikhail Khodak,et al.  Initialization and Regularization of Factorized Neural Layers , 2021, ICLR.

[6]  Christian Lubich,et al.  A rank-adaptive robust integrator for dynamical low-rank approximation , 2021, BIT Numerical Mathematics.

[7]  Zi Wang,et al.  Convolutional Neural Network Pruning with Structural Redundancy Reduction , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Dimitris Papailiopoulos,et al.  Pufferfish: Communication-efficient Models At No Extra Cost , 2021, MLSys.

[9]  Tanima Dutta,et al.  A Survey on Deep Neural Network Compression: Challenges, Overview, and Solutions , 2020, ArXiv.

[10]  Christian Lubich,et al.  An unconventional robust integrator for dynamical low-rank approximation , 2020, BIT Numerical Mathematics.

[11]  Andrzej Cichocki,et al.  Stable Low-rank Tensor Decomposition for Compression of Convolutional Neural Network , 2020, ECCV.

[12]  Feiteng Li,et al.  Speeding Up Deep Convolutional Neural Networks Based on Tucker-CP Decomposition , 2020, ICML 2020.

[13]  Miguel Á. Carreira-Perpiñán,et al.  Low-Rank Compression of Neural Nets: Learning the Rank of Each Layer , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Ang Li,et al.  Learning Low-rank Deep Neural Networks via Singular Vector Orthogonality Regularization and Singular Value Sparsification , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[15]  Jose Javier Gonzalez Ortiz,et al.  What is the State of Neural Network Pruning? , 2020, MLSys.

[16]  Hanna Walach,et al.  Time integration of tree tensor networks , 2020, SIAM J. Numer. Anal..

[17]  H. Rauhut,et al.  Learning deep linear neural networks: Riemannian gradient flows and convergence to global minimizers , 2019, Information and Inference: A Journal of the IMA.

[18]  Luc Van Gool,et al.  Learning Filter Basis for Convolutional Neural Network Compression , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[19]  Markus Nagel,et al.  Data-Free Quantization Through Weight Equalization and Bias Correction , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[20]  Christopher Ré,et al.  Low-Memory Neural Network Training: A Technical Report , 2019, ArXiv.

[21]  Maja Pantic,et al.  T-Net: Parametrizing Fully Convolutional Nets With a Single High-Order Tensor , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Chong Li,et al.  Constrained Optimization Based Low-Rank Approximation of Deep Neural Networks , 2018, ECCV.

[23]  R. Sarpong,et al.  Bio-inspired synthesis of xishacorenes A, B, and C, and a new congener from fuscol† †Electronic supplementary information (ESI) available. See DOI: 10.1039/c9sc02572c , 2019, Chemical science.

[24]  Michael Carbin,et al.  The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks , 2018, ICLR.

[25]  Tao Zhang,et al.  Model Compression and Acceleration for Deep Neural Networks: The Principles, Progress, and Challenges , 2018, IEEE Signal Processing Magazine.

[26]  Hanna Walach,et al.  Time Integration of Rank-Constrained Tucker Tensors , 2017, SIAM J. Numer. Anal..

[27]  Xiangyu Zhang,et al.  Channel Pruning for Accelerating Very Deep Neural Networks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[28]  Erich Elsen,et al.  Exploring Sparsity in Recurrent Neural Networks , 2017, ICLR.

[29]  F. Bach,et al.  Integration Methods and Accelerated Optimization Algorithms , 2017, 1702.06751.

[30]  Max Welling,et al.  Soft Weight-Sharing for Neural Network Compression , 2017, ICLR.

[31]  Seung-Ik Lee,et al.  CP-decomposition with Tensor Power Method for Convolutional Neural Networks compression , 2017, 2017 IEEE International Conference on Big Data and Smart Computing (BigComp).

[32]  Timo Aila,et al.  Pruning Convolutional Neural Networks for Resource Efficient Inference , 2016, ICLR.

[33]  Eriko Nurvitadhi,et al.  Accelerating Deep Convolutional Networks using low-precision and sparsity , 2016, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[34]  Yurong Chen,et al.  Dynamic Network Surgery for Efficient DNNs , 2016, NIPS.

[35]  Hanna Walach,et al.  Discretized Dynamical Low-Rank Approximation in the Presence of Small Singular Values , 2016, SIAM J. Numer. Anal..

[36]  Ran El-Yaniv,et al.  Binarized Neural Networks , 2016, NIPS.

[37]  Jian Cheng,et al.  Quantized Convolutional Neural Networks for Mobile Devices , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[38]  Eunhyeok Park,et al.  Compression of Deep Convolutional Neural Networks for Fast and Low Power Mobile Applications , 2015, ICLR.

[39]  Suvrit Sra,et al.  Diversity Networks: Neural Network Compression Using Determinantal Point Processes , 2015, 1511.05077.

[40]  Song Han,et al.  Learning both Weights and Connections for Efficient Neural Network , 2015, NIPS.

[41]  Hassan Foroosh,et al.  Sparse Convolutional Neural Networks , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[42]  Shih-Fu Chang,et al.  An Exploration of Parameter Redundancy in Deep Networks with Circulant Projections , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[43]  Pritish Narayanan,et al.  Deep Learning with Limited Numerical Precision , 2015, ICML.

[44]  Yoshua Bengio,et al.  Training deep neural networks with low precision multiplications , 2014 .

[45]  Ivan V. Oseledets,et al.  Speeding-up Convolutional Neural Networks Using Fine-tuned CP-Decomposition , 2014, ICLR.

[46]  Ming Yang,et al.  Compressing Deep Convolutional Networks using Vector Quantization , 2014, ArXiv.

[47]  C. Lubich,et al.  A projector-splitting integrator for dynamical low-rank approximation , 2013, BIT Numerical Mathematics.

[48]  Othmar Koch,et al.  Dynamical Tensor Approximation , 2010, SIAM J. Matrix Anal. Appl..

[49]  Tamara G. Kolda,et al.  Tensor Decompositions and Applications , 2009, SIAM Rev..

[50]  Othmar Koch,et al.  Dynamical Low-Rank Approximation , 2007, SIAM J. Matrix Anal. Appl..

[51]  Joos Vandewalle,et al.  A Multilinear Singular Value Decomposition , 2000, SIAM J. Matrix Anal. Appl..

[52]  David J. Schwab,et al.  Supervised Learning with Tensor Networks , 2016, NIPS.

[53]  Christian Lubich,et al.  Time Integration in the Multiconfiguration Time-Dependent Hartree Method of Molecular Quantum Dynamics , 2015 .

[54]  Vincent Vanhoucke,et al.  Improving the speed of neural networks on CPUs , 2011 .

[55]  E. Hairer,et al.  Solving Ordinary Differential Equations II , 2010 .