论文信息 - Self-Tuning for Data-Efficient Deep Learning

Self-Tuning for Data-Efficient Deep Learning

Deep learning has made revolutionary advances to diverse applications in the presence of large-scale labeled datasets. However, it is prohibitively timecostly and labor-expensive to collect sufficient labeled data in most realistic scenarios. To mitigate the requirement for labeled data, semi-supervised learning (SSL) focuses on simultaneously exploring both labeled and unlabeled data, while transfer learning (TL) popularizes a favorable practice of fine-tuning a pre-trained model to the target data. A dilemma is thus encountered: Without a decent pre-trained model to provide an implicit regularization, SSL through self-training from scratch will be easily misled by inaccurate pseudo-labels, especially in large-sized label space; Without exploring the intrinsic structure of unlabeled data, TL through fine-tuning from limited labeled data is at risk of under-transfer caused by model shift. To escape from this dilemma, we present SelfTuning to enable data-efficient deep learning by unifying the exploration of labeled and unlabeled data and the transfer of a pre-trained model, as well as a Pseudo Group Contrast (PGC) mechanism to mitigate the reliance on pseudo-labels and boost the tolerance to false labels. Self-Tuning outperforms its SSL and TL counterparts on five tasks by sharp margins, e.g. it doubles the accuracy of fine-tuning on Cars with 15% labels.

[1] Yingyu Liang,et al. SimpleTran: Transferring Pre-Trained Sentence Embeddings for Low Resource Text Classification , 2020, ArXiv.

[2] Xuanjing Huang,et al. How to Fine-Tune BERT for Text Classification? , 2019, CCL.

[3] Mark Chen,et al. Language Models are Few-Shot Learners , 2020, NeurIPS.

[4] Jonathan Krause,et al. 3D Object Representations for Fine-Grained Categorization , 2013, 2013 IEEE International Conference on Computer Vision Workshops.

[5] Jon Kleinberg,et al. Transfusion: Understanding Transfer Learning for Medical Imaging , 2019, NeurIPS.

[6] Oriol Vinyals,et al. Representation Learning with Contrastive Predictive Coding , 2018, ArXiv.

[7] Pietro Perona,et al. The Caltech-UCSD Birds-200-2011 Dataset , 2011 .

[8] Kaiming He,et al. Rethinking ImageNet Pre-Training , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[9] Geoffrey E. Hinton,et al. A Simple Framework for Contrastive Learning of Visual Representations , 2020, ICML.

[10] Harri Valpola,et al. Weight-averaged consistency targets improve semi-supervised deep learning results , 2017, ArXiv.

[11] David Berthelot,et al. FixMatch: Simplifying Semi-Supervised Learning with Consistency and Confidence , 2020, NeurIPS.

[12] Ran Wang,et al. To Tune or Not To Tune? How About the Best of Both Worlds? , 2019, ArXiv.

[13] Trevor Darrell,et al. DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition , 2013, ICML.

[14] Geoffrey E. Hinton,et al. Big Self-Supervised Models are Strong Semi-Supervised Learners , 2020, NeurIPS.

[15] Kaiming He,et al. Momentum Contrast for Unsupervised Visual Representation Learning , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[16] Subhransu Maji,et al. Fine-Grained Visual Classification of Aircraft , 2013, ArXiv.

[17] Mingsheng Long,et al. Co-Tuning for Transfer Learning , 2020, NeurIPS.

[18] Trevor Darrell,et al. Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[19] Xuhong Li,et al. Explicit Inductive Bias for Transfer Learning with Convolutional Networks , 2018, ICML.

[20] Mona Attariyan,et al. Parameter-Efficient Transfer Learning for NLP , 2019, ICML.

[21] Ivan Laptev,et al. Learning and Transferring Mid-level Image Representations Using Convolutional Neural Networks , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[22] Noel E. O'Connor,et al. Pseudo-Labeling and Confirmation Bias in Deep Semi-Supervised Learning , 2019, 2020 International Joint Conference on Neural Networks (IJCNN).

[23] Dong-Hyun Lee,et al. Pseudo-Label : The Simple and Efficient Semi-Supervised Learning Method for Deep Neural Networks , 2013 .

[24] Erik F. Tjong Kim Sang,et al. Introduction to the CoNLL-2003 Shared Task: Language-Independent Named Entity Recognition , 2003, CoNLL.

[25] Ce Liu,et al. Supervised Contrastive Learning , 2020, NeurIPS.

[26] Yoshua Bengio,et al. Semi-supervised Learning by Entropy Minimization , 2004, CAP.

[27] Xinyang Chen,et al. Catastrophic Forgetting Meets Negative Transfer: Batch Spectral Shrinkage for Safe Transfer Learning , 2019, NeurIPS.

[28] Colin Wei,et al. Theoretical Analysis of Self-Training with Deep Networks on Unlabeled Data , 2020, ICLR.

[29] Quoc V. Le,et al. EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks , 2019, ICML.

[30] Yoshua Bengio,et al. How transferable are features in deep neural networks? , 2014, NIPS.

[31] Alex Krizhevsky,et al. Learning Multiple Layers of Features from Tiny Images , 2009 .

[32] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[33] Li Fei-Fei,et al. ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[34] Stella X. Yu,et al. Unsupervised Feature Learning via Non-parametric Instance Discrimination , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[35] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[36] Jitendra Malik,et al. Analyzing the Performance of Multilayer Neural Networks for Object Recognition , 2014, ECCV.

[37] Haoyi Xiong,et al. DELTA: DEep Learning Transfer using Feature Map with Attention for Convolutional Networks , 2019, ICLR.

[38] Quoc V. Le,et al. Unsupervised Data Augmentation for Consistency Training , 2019, NeurIPS.

[39] David Yarowsky,et al. Unsupervised Word Sense Disambiguation Rivaling Supervised Methods , 1995, ACL.