论文信息 - Boosting Self-Supervised Learning via Knowledge Transfer

Boosting Self-Supervised Learning via Knowledge Transfer

In self-supervised learning, one trains a model to solve a so-called pretext task on a dataset without the need for human annotation. The main objective, however, is to transfer this model to a target domain and task. Currently, the most effective transfer strategy is fine-tuning, which restricts one to use the same model or parts thereof for both pretext and target tasks. In this paper, we present a novel framework for self-supervised learning that overcomes limitations in designing and comparing different tasks, models, and data domains. In particular, our framework decouples the structure of the self-supervised model from the final task-specific fine-tuned model. This allows us to: 1) quantitatively assess previously incompatible models including handcrafted features; 2) show that deeper neural network models can learn better representations from the same pretext task; 3) transfer knowledge learned with a deep model to a shallower one and thus boost its learning. We use this framework to design a novel self-supervised task, which achieves state-of-the-art performance on the common benchmarks in PASCAL VOC 2007, ILSVRC12 and Places by a significant margin. Our learned features shrink the mAP gap between models trained via self-supervised learning and supervised learning from 5.9% to 2.6% in object detection on PASCAL VOC 2007.

[1] Nikos Komodakis,et al. Paying More Attention to Attention: Improving the Performance of Convolutional Neural Networks via Attention Transfer , 2016, ICLR.

[2] Geoffrey E. Hinton,et al. Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[3] Martial Hebert,et al. Learning from Small Sample Sets by Combining Unsupervised Meta-Training with CNNs , 2016, NIPS.

[4] Alexei A. Efros,et al. What makes ImageNet good for transfer learning? , 2016, ArXiv.

[5] Bolei Zhou,et al. Learning Deep Features for Scene Recognition using Places Database , 2014, NIPS.

[6] Samy Bengio,et al. Understanding deep learning requires rethinking generalization , 2016, ICLR.

[7] Björn Ommer,et al. LSTM Self-Supervision for Detailed Behavior Analysis , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8] Rich Caruana,et al. Model compression , 2006, KDD '06.

[9] Martial Hebert,et al. Few-Shot Hash Learning for Image Retrieval , 2017, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).

[10] Kristen Grauman,et al. Learning Image Representations Tied to Ego-Motion , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[11] Iasonas Kokkinos,et al. Deep Filter Banks for Texture Recognition, Description, and Segmentation , 2015, International Journal of Computer Vision.

[12] Rich Caruana,et al. Do Deep Nets Really Need to be Deep? , 2013, NIPS.

[13] Trevor Darrell,et al. Data-dependent Initializations of Convolutional Neural Networks , 2015, ICLR.

[14] Martial Hebert,et al. Learning to Learn: Model Regression Networks for Easy Small Sample Learning , 2016, ECCV.

[15] Nitish Srivastava. Unsupervised Learning of Visual Representations using Videos , 2015 .

[16] Trevor Darrell,et al. Adversarial Feature Learning , 2016, ICLR.

[17] Trevor Darrell,et al. Fully Convolutional Networks for Semantic Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18] G. G. Stokes. "J." , 1890, The New Yale Book of Quotations.

[19] Bill Triggs,et al. Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[20] Andrew Zisserman,et al. Multi-task Self-Supervised Visual Learning , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[21] Jitendra Malik,et al. Learning to See by Moving , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[22] Andrew Owens,et al. Ambient Sound Provides Supervision for Visual Learning , 2016, ECCV.

[23] Alexei A. Efros,et al. Split-Brain Autoencoders: Unsupervised Learning by Cross-Channel Prediction , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[24] Michael S. Bernstein,et al. ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[25] Abhinav Gupta,et al. Transitive Invariance for Self-Supervised Visual Representation Learning , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[26] Trevor Darrell,et al. Learning Features by Watching Objects Move , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27] Ross B. Girshick,et al. Fast R-CNN , 2015, 1504.08083.

[28] Jun'ichi Tsujii,et al. A discriminative language model with pseudo-negative samples , 2007, ACL.

[29] Yoshua Bengio,et al. Extracting and composing robust features with denoising autoencoders , 2008, ICML '08.

[30] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[31] Jürgen Schmidhuber,et al. Formal Theory of Creativity, Fun, and Intrinsic Motivation (1990–2010) , 2010, IEEE Transactions on Autonomous Mental Development.

[32] Alexei A. Efros,et al. Context Encoders: Feature Learning by Inpainting , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[33] Paolo Favaro,et al. Unsupervised Learning of Visual Representations by Solving Jigsaw Puzzles , 2016, ECCV.

[34] Martial Hebert,et al. Shuffle and Learn: Unsupervised Learning Using Temporal Order Verification , 2016, ECCV.

[35] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[36] Gregory Shakhnarovich,et al. Learning Representations for Automatic Colorization , 2016, ECCV.

[37] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[38] Alexei A. Efros,et al. Unsupervised Visual Representation Learning by Context Prediction , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[39] Paolo Favaro,et al. Representation Learning by Learning to Count , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[40] Alexei A. Efros,et al. Colorful Image Colorization , 2016, ECCV.

[41] Thomas Brox,et al. Discriminative Unsupervised Feature Learning with Exemplar Convolutional Neural Networks , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.