Multiple Pretext-Task for Self-Supervised Learning via Mixing Multiple Image Transformations

Self-supervised learning is one of the most promising approaches to learn representations capturing semantic features in images without any manual annotation cost. To learn useful representations, a self-supervised model solves a pretext-task, which is defined by data itself. Among a number of pretext-tasks, the rotation prediction task (Rotation) achieves better representations for solving various target tasks despite its simplicity of the implementation. However, we found that Rotation can fail to capture semantic features related to image textures and colors. To tackle this problem, we introduce a learning technique called multiple pretext-task for self-supervised learning (MP-SSL), which solves multiple pretext-task in addition to Rotation simultaneously. In order to capture features of textures and colors, we employ the transformations of image enhancements (e.g., sharpening and solarizing) as the additional pretext-tasks. MP-SSL efficiently trains a model by leveraging a Frank-Wolfe based multi-task training algorithm. Our experimental results show MP-SSL models outperform Rotation on multiple standard benchmarks and achieve state-of-the-art performance on Places-205.

[1]  Jian Sun,et al.  Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[2]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[3]  Alexander Kolesnikov,et al.  Revisiting Self-Supervised Visual Representation Learning , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Nikos Komodakis,et al.  Unsupervised Representation Learning by Predicting Image Rotations , 2018, ICLR.

[5]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[6]  Alexei A. Efros,et al.  Split-Brain Autoencoders: Unsupervised Learning by Cross-Channel Prediction , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Larry S. Davis,et al.  SNIPER: Efficient Multi-Scale Training , 2018, NeurIPS.

[8]  Eric Jones,et al.  SciPy: Open Source Scientific Tools for Python , 2001 .

[9]  Paolo Favaro,et al.  Self-Supervised Feature Learning by Learning to Spot Artifacts , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[10]  Paolo Favaro,et al.  Boosting Self-Supervised Learning via Knowledge Transfer , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[11]  George Papandreou,et al.  Searching for Efficient Multi-Scale Architectures for Dense Image Prediction , 2018, NeurIPS.

[12]  Michael S. Brown,et al.  Classification-Driven Dynamic Image Enhancement , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[13]  Iasonas Kokkinos,et al.  Describing Textures in the Wild , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[14]  Alexei A. Efros,et al.  Colorful Image Colorization , 2016, ECCV.

[15]  Quoc V. Le,et al.  EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks , 2019, ICML.

[16]  Barry Y. Chen,et al.  Improvements to Context Based Self-Supervised Learning , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[17]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Bolei Zhou,et al.  Learning Deep Features for Scene Recognition using Places Database , 2014, NIPS.

[19]  Leon A. Gatys,et al.  Texture Synthesis Using Convolutional Neural Networks , 2015, NIPS.

[20]  Thomas Brox,et al.  Discriminative Unsupervised Feature Learning with Convolutional Neural Networks , 2014, NIPS.

[21]  Paolo Favaro,et al.  Unsupervised Learning of Visual Representations by Solving Jigsaw Puzzles , 2016, ECCV.

[22]  Iasonas Kokkinos,et al.  UberNet: Training a Universal Convolutional Neural Network for Low-, Mid-, and High-Level Vision Using Diverse Datasets and Limited Memory , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Martin Jaggi,et al.  Revisiting Frank-Wolfe: Projection-Free Sparse Convex Optimization , 2013, ICML.

[24]  Saman Ghili,et al.  Tiny ImageNet Visual Recognition Challenge , 2014 .

[25]  Alexander Kolesnikov,et al.  S4L: Self-Supervised Semi-Supervised Learning , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[26]  Yong Jae Lee,et al.  Cross-Domain Self-Supervised Multi-task Feature Learning Using Synthetic Imagery , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[27]  Dacheng Tao,et al.  Self-Supervised Representation Learning by Rotation Feature Decoupling , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[29]  Vladlen Koltun,et al.  Multi-Task Learning as Multi-Objective Optimization , 2018, NeurIPS.

[30]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[31]  Pascal Vincent,et al.  Representation Learning: A Review and New Perspectives , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[32]  Alexei A. Efros,et al.  Unsupervised Visual Representation Learning by Context Prediction , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[33]  Paolo Favaro,et al.  Representation Learning by Learning to Count , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[34]  Andrew Zisserman,et al.  Multi-task Self-Supervised Visual Learning , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[35]  Stella X. Yu,et al.  Unsupervised Feature Learning via Non-parametric Instance Discrimination , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[36]  Kaiming He,et al.  Exploring the Limits of Weakly Supervised Pretraining , 2018, ECCV.

[37]  Xiaohua Zhai,et al.  Self-Supervised GANs via Auxiliary Rotation Loss , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[38]  Xiaohua Zhai,et al.  High-Fidelity Image Generation With Fewer Labels , 2019, ICML.

[39]  Leonidas J. Guibas,et al.  Taskonomy: Disentangling Task Transfer Learning , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[40]  Wonhee Lee,et al.  Multi-Task Self-Supervised Object Detection via Recycling of Bounding Box Annotations , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[41]  Quoc V. Le,et al.  AutoAugment: Learning Augmentation Strategies From Data , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[42]  Matthias Bethge,et al.  ImageNet-trained CNNs are biased towards texture; increasing shape bias improves accuracy and robustness , 2018, ICLR.

[43]  Trevor Darrell,et al.  Data-dependent Initializations of Convolutional Neural Networks , 2015, ICLR.

[44]  Gregory Shakhnarovich,et al.  Colorization as a Proxy Task for Visual Understanding , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[45]  Nikos Komodakis,et al.  Wide Residual Networks , 2016, BMVC.

[46]  Alexei A. Efros,et al.  Context Encoders: Feature Learning by Inpainting , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[47]  Chen Sun,et al.  Revisiting Unreasonable Effectiveness of Data in Deep Learning Era , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[48]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[49]  Bolei Zhou,et al.  Network Dissection: Quantifying Interpretability of Deep Visual Representations , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[50]  Matthias Bethge,et al.  Approximating CNNs with Bag-of-local-Features models works surprisingly well on ImageNet , 2019, ICLR.