Cycle Self-Training for Domain Adaptation

Mainstream approaches for unsupervised domain adaptation (UDA) learn domaininvariant representations to narrow the domain shift, which are empirically effective but theoretically challenged by the hardness or impossibility theorems. Recently, self-training has been gaining momentum in UDA, which exploits unlabeled target data by training with target pseudo-labels. However, as corroborated in this work, under distributional shift, the pseudo-labels can be unreliable in terms of their large discrepancy from target ground truth. In this paper, we propose Cycle Self-Training (CST), a principled self-training algorithm that explicitly enforces pseudo-labels to generalize across domains. CST cycles between a forward step and a reverse step until convergence. In the forward step, CST generates target pseudo-labels with a source-trained classifier. In the reverse step, CST trains a target classifier using target pseudo-labels, and then updates the shared representations to make the target classifier perform well on the source data. We introduce the Tsallis entropy as a confidence-friendly regularization to improve the quality of target pseudo-labels. We analyze CST theoretically under realistic assumptions, and provide hard cases where CST recovers target ground truth, while both invariant feature learning and vanilla self-training fail. Empirical results indicate that CST significantly improves over the state-of-the-arts on visual recognition and sentiment analysis benchmarks.

[1]  Hongyi Zhang,et al.  mixup: Beyond Empirical Risk Minimization , 2017, ICLR.

[2]  Hossein Mobahi,et al.  Sharpness-Aware Minimization for Efficiently Improving Generalization , 2020, ArXiv.

[3]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[4]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Kate Saenko,et al.  VisDA: The Visual Domain Adaptation Challenge , 2017, ArXiv.

[6]  Yoshua Bengio,et al.  How transferable are features in deep neural networks? , 2014, NIPS.

[7]  Tengyu Ma,et al.  Understanding Self-Training for Gradual Domain Adaptation , 2020, ICML.

[8]  Tao Xiang,et al.  Stochastic Classifiers for Unsupervised Domain Adaptation , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Shin Ishii,et al.  Virtual Adversarial Training: A Regularization Method for Supervised and Semi-Supervised Learning , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  Nicholas Carlini,et al.  Poisoning the Unlabeled Dataset of Semi-Supervised Learning , 2021, USENIX Security Symposium.

[11]  Noel E. O'Connor,et al.  Pseudo-Labeling and Confirmation Bias in Deep Semi-Supervised Learning , 2019, 2020 International Joint Conference on Neural Networks (IJCNN).

[12]  M. Talagrand Upper and Lower Bounds for Stochastic Processes: Modern Methods and Classical Problems , 2014 .

[13]  Edwin Lughofer,et al.  Central Moment Discrepancy (CMD) for Domain-Invariant Representation Learning , 2017, ICLR.

[14]  Kilian Q. Weinberger,et al.  On Calibration of Modern Neural Networks , 2017, ICML.

[15]  Peter L. Bartlett,et al.  Rademacher and Gaussian Complexities: Risk Bounds and Structural Results , 2003, J. Mach. Learn. Res..

[16]  Ashirbani Saha,et al.  Deep learning for segmentation of brain tumors: Impact of cross‐institutional training and testing , 2018, Medical physics.

[17]  Ruosong Wang,et al.  On Exact Computation with an Infinitely Wide Neural Net , 2019, NeurIPS.

[18]  Quoc V. Le,et al.  Randaugment: Practical automated data augmentation with a reduced search space , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[19]  Tong Che,et al.  Rethinking Distributional Matching Based Domain Adaptation , 2020, ArXiv.

[20]  Subhabrata Mukherjee,et al.  Uncertainty-aware Self-training for Few-shot Text Classification , 2020, NeurIPS.

[21]  Quanquan Gu,et al.  Self-training Converts Weak Learners to Strong Learners in Mixture Models , 2021, AISTATS.

[22]  Martial Hebert,et al.  Semi-Supervised Self-Training of Object Detection Models , 2005, 2005 Seventh IEEE Workshops on Applications of Computer Vision (WACV/MOTION'05) - Volume 1.

[23]  Taesung Park,et al.  CyCADA: Cycle-Consistent Adversarial Domain Adaptation , 2017, ICML.

[24]  Quoc V. Le,et al.  STraTA: Self-Training with Task Augmentation for Better Few-shot Learning , 2021, EMNLP.

[25]  Luca Bertinetto,et al.  Meta-learning with differentiable closed-form solvers , 2018, ICLR.

[26]  Ameet Talwalkar,et al.  Foundations of Machine Learning , 2012, Adaptive computation and machine learning.

[27]  Shai Ben-David,et al.  On the Hardness of Domain Adaptation and the Utility of Unlabeled Target Samples , 2012, ALT.

[28]  Tapani Raiko,et al.  Semi-supervised Learning with Ladder Networks , 2015, NIPS.

[29]  Tatsuya Harada,et al.  Maximum Classifier Discrepancy for Unsupervised Domain Adaptation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[30]  Michael I. Jordan,et al.  Learning Transferable Features with Deep Adaptation Networks , 2015, ICML.

[31]  Harri Valpola,et al.  Weight-averaged consistency targets improve semi-supervised deep learning results , 2017, ArXiv.

[32]  Vishrav Chaudhary,et al.  Self-training Improves Pre-training for Natural Language Understanding , 2020, NAACL.

[33]  Quoc V. Le,et al.  Self-Training With Noisy Student Improves ImageNet Classification , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  Dong-Hyun Lee,et al.  Pseudo-Label : The Simple and Efficient Semi-Supervised Learning Method for Deep Neural Networks , 2013 .

[35]  David Berthelot,et al.  MixMatch: A Holistic Approach to Semi-Supervised Learning , 2019, NeurIPS.

[36]  Sergey Levine,et al.  Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.

[37]  Marco Loog,et al.  A soft-labeled self-training approach , 2016, 2016 23rd International Conference on Pattern Recognition (ICPR).

[38]  Quoc V. Le,et al.  Multi-Task Self-Training for Learning General Representations , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[39]  Qiang Yang,et al.  A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[40]  Roi Reichart,et al.  Pivot Based Language Modeling for Improved Neural Domain Adaptation , 2018, NAACL.

[41]  Kate Saenko,et al.  Class-Imbalanced Domain Adaptation: An Empirical Odyssey , 2019, ECCV Workshops.

[42]  Joan Bruna,et al.  Intriguing properties of neural networks , 2013, ICLR.

[43]  Michael I. Jordan,et al.  Deep Transfer Learning with Joint Adaptation Networks , 2016, ICML.

[44]  Bo Wang,et al.  Moment Matching for Multi-Source Domain Adaptation , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[45]  Neil D. Lawrence,et al.  Dataset Shift in Machine Learning , 2009 .

[46]  Quoc V. Le,et al.  Rethinking Pre-training and Self-training , 2020, NeurIPS.

[47]  Philip Bachman,et al.  Learning with Pseudo-Ensembles , 2014, NIPS.

[48]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[49]  Stefano Ermon,et al.  A DIRT-T Approach to Unsupervised Domain Adaptation , 2018, ICLR.

[50]  Mohammad Havaei,et al.  Implicit Class-Conditioned Domain Alignment for Unsupervised Domain Adaptation , 2020, ICML.

[51]  Yu Cheng,et al.  Adversarial Category Alignment Network for Cross-domain Sentiment Classification , 2019, NAACL.

[52]  Han Zhao,et al.  On Learning Invariant Representations for Domain Adaptation , 2019, ICML.

[53]  Tengyu Ma,et al.  In-N-Out: Pre-Training and Self-Training using Auxiliary Information for Out-of-Distribution Robustness , 2020, ICLR.

[54]  Xiaojin Zhu,et al.  Semi-Supervised Learning , 2010, Encyclopedia of Machine Learning.

[55]  Xiaofeng Liu,et al.  Confidence Regularized Self-Training , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[56]  Judy Hoffman,et al.  SENTRY: Selective Entropy Optimization via Committee Consistency for Unsupervised Domain Adaptation , 2020, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[57]  Rajesh Ranganath,et al.  Support and Invertibility in Domain-Invariant Representations , 2019, AISTATS.

[58]  Yuchen Zhang,et al.  Bridging Theory and Algorithm for Domain Adaptation , 2019, ICML.

[59]  Michael I. Jordan,et al.  Unsupervised Domain Adaptation with Residual Transfer Networks , 2016, NIPS.

[60]  Geoffrey French,et al.  Self-ensembling for visual domain adaptation , 2017, ICLR.

[61]  Sethuraman Panchanathan,et al.  Deep Hashing Network for Unsupervised Domain Adaptation , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[62]  Yoshua Bengio,et al.  Semi-supervised Learning by Entropy Minimization , 2004, CAP.

[63]  Liang Lin,et al.  Larger Norm More Transferable: An Adaptive Feature Norm Approach for Unsupervised Domain Adaptation , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[64]  Junzhou Huang,et al.  Progressive Feature Alignment for Unsupervised Domain Adaptation , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[65]  Trevor Darrell,et al.  Adversarial Discriminative Domain Adaptation , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[66]  Yang Zou,et al.  Domain Adaptation for Semantic Segmentation via Class-Balanced Self-Training , 2018, ArXiv.

[67]  Colin Wei,et al.  Self-training Avoids Using Spurious Features Under Domain Shift , 2020, NeurIPS.

[68]  Koby Crammer,et al.  A theory of learning from different domains , 2010, Machine Learning.

[69]  Natalia Gimelshein,et al.  PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[70]  Ruiqi Gao,et al.  A Theory of Label Propagation for Subpopulation Shift , 2021, ICML.

[71]  Michael I. Jordan,et al.  Transferable Adversarial Training: A General Approach to Adapting Deep Classifiers , 2019, ICML.

[72]  Michael I. Jordan,et al.  Conditional Adversarial Domain Adaptation , 2017, NeurIPS.

[73]  Tolga Tasdizen,et al.  Regularization With Stochastic Transformations and Perturbations for Deep Semi-Supervised Learning , 2016, NIPS.

[74]  David Berthelot,et al.  FixMatch: Simplifying Semi-Supervised Learning with Consistency and Confidence , 2020, NeurIPS.

[75]  François Laviolette,et al.  Domain-Adversarial Training of Neural Networks , 2015, J. Mach. Learn. Res..

[76]  Colin Wei,et al.  Theoretical Analysis of Self-Training with Deep Networks on Unlabeled Data , 2020, ICLR.

[77]  John Blitzer,et al.  Biographies, Bollywood, Boom-boxes and Blenders: Domain Adaptation for Sentiment Classification , 2007, ACL.

[78]  C. Tsallis Possible generalization of Boltzmann-Gibbs statistics , 1988 .