Pruning Convolutional Neural Networks with Self-Supervision

Convolutional neural networks trained without supervision come close to matching performance with supervised pre-training, but sometimes at the cost of an even higher number of parameters. Extracting subnetworks from these large unsupervised convnets with preserved performance is of particular interest to make them less computationally intensive. Typical pruning methods operate during training on a task while trying to maintain the performance of the pruned network on the same task. However, in self-supervised feature learning, the training objective is agnostic on the representation transferability to downstream tasks. Thus, preserving performance for this objective does not ensure that the pruned subnetwork remains effective for solving downstream tasks. In this work, we investigate the use of standard pruning methods, developed primarily for supervised learning, for networks trained without labels (i.e. on self-supervised tasks). We show that pruned masks obtained with or without labels reach comparable performance when retrained on labels, suggesting that pruning operates similarly for self-supervised and supervised learning. Interestingly, we also find that pruning preserves the transfer performance of self-supervised subnetwork representations.

[1]  Phillip Isola,et al.  Contrastive Multiview Coding , 2019, ECCV.

[2]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[3]  Nikos Komodakis,et al.  Unsupervised Representation Learning by Predicting Image Rotations , 2018, ICLR.

[4]  James A. Storer,et al.  RePr: Improved Training of Convolutional Filters , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[6]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Nitish Srivastava Unsupervised Learning of Visual Representations using Videos , 2015 .

[8]  Yingli Tian,et al.  Self-Supervised Visual Feature Learning With Deep Neural Networks: A Survey , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Mingjie Sun,et al.  Rethinking the Value of Network Pruning , 2018, ICLR.

[10]  Yixin Chen,et al.  Compressing Neural Networks with the Hashing Trick , 2015, ICML.

[11]  Luca Antiga,et al.  Automatic differentiation in PyTorch , 2017 .

[12]  Ali Razavi,et al.  Data-Efficient Image Recognition with Contrastive Predictive Coding , 2019, ICML.

[13]  R Devon Hjelm,et al.  Learning Representations by Maximizing Mutual Information Across Views , 2019, NeurIPS.

[14]  Andrew Zisserman,et al.  Multi-task Self-Supervised Visual Learning , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[15]  Xin Dong,et al.  Learning to Prune Deep Neural Networks via Layer-wise Optimal Brain Surgeon , 2017, NIPS.

[16]  Jason Yosinski,et al.  Deconstructing Lottery Tickets: Zeros, Signs, and the Supermask , 2019, NeurIPS.

[17]  Trevor Darrell,et al.  Learning Features by Watching Objects Move , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Yurong Chen,et al.  Dynamic Network Surgery for Efficient DNNs , 2016, NIPS.

[19]  Shin Ishii,et al.  Virtual Adversarial Training: A Regularization Method for Supervised and Semi-Supervised Learning , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20]  Julien Mairal,et al.  Unsupervised Pre-Training of Image Features on Non-Curated Data , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[21]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[22]  Luc Van Gool,et al.  The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[23]  Abhinav Gupta,et al.  Scaling and Benchmarking Self-Supervised Visual Representation Learning , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[24]  Frank Hutter,et al.  SGDR: Stochastic Gradient Descent with Warm Restarts , 2016, ICLR.

[25]  Armand Joulin,et al.  Unsupervised Learning by Predicting Noise , 2017, ICML.

[26]  Michael Carbin,et al.  The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks , 2018, ICLR.

[27]  Stella X. Yu,et al.  Unsupervised Feature Learning via Non-parametric Instance Discrimination , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[28]  Bolei Zhou,et al.  Learning Deep Features for Scene Recognition using Places Database , 2014, NIPS.

[29]  Song Han,et al.  Learning both Weights and Connections for Efficient Neural Network , 2015, NIPS.

[30]  Hanan Samet,et al.  Pruning Filters for Efficient ConvNets , 2016, ICLR.

[31]  Max Welling,et al.  Soft Weight-Sharing for Neural Network Compression , 2017, ICLR.

[32]  Max Welling,et al.  Learning Sparse Neural Networks through L0 Regularization , 2017, ICLR.

[33]  Gintare Karolina Dziugaite,et al.  Linear Mode Connectivity and the Lottery Ticket Hypothesis , 2019, ICML.

[34]  Paolo Favaro,et al.  Unsupervised Learning of Visual Representations by Solving Jigsaw Puzzles , 2016, ECCV.

[35]  Dong-Hyun Lee,et al.  Pseudo-Label : The Simple and Efficient Semi-Supervised Learning Method for Deep Neural Networks , 2013 .

[36]  Matthijs Douze,et al.  Deep Clustering for Unsupervised Learning of Visual Features , 2018, ECCV.

[37]  Yuandong Tian,et al.  One ticket to win them all: generalizing lottery ticket initializations across datasets and optimizers , 2019, NeurIPS.

[38]  Alexei A. Efros,et al.  Colorful Image Colorization , 2016, ECCV.

[39]  Yuandong Tian,et al.  Playing the lottery with rewards and multiple languages: lottery tickets in RL and NLP , 2019, ICLR.

[40]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[41]  Thomas Brox,et al.  Discriminative Unsupervised Feature Learning with Exemplar Convolutional Neural Networks , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[42]  Yann LeCun,et al.  Optimal Brain Damage , 1989, NIPS.

[43]  Alexei A. Efros,et al.  Unsupervised Visual Representation Learning by Context Prediction , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).