论文信息 - MixMatch: A Holistic Approach to Semi-Supervised Learning

MixMatch: A Holistic Approach to Semi-Supervised Learning

Semi-supervised learning has proven to be a powerful paradigm for leveraging unlabeled data to mitigate the reliance on large labeled datasets. In this work, we unify the current dominant approaches for semi-supervised learning to produce a new algorithm, MixMatch, that works by guessing low-entropy labels for data-augmented unlabeled examples and mixing labeled and unlabeled data using MixUp. We show that MixMatch obtains state-of-the-art results by a large margin across many datasets and labeled data amounts. For example, on CIFAR-10 with 250 labels, we reduce error rate by a factor of 4 (from 38% to 11%) and by a factor of 2 on STL-10. We also demonstrate how MixMatch can help achieve a dramatically better accuracy-privacy trade-off for differential privacy. Finally, we perform an ablation study to tease apart which components of MixMatch are most important for its success.

[1] Hongyi Zhang,et al. mixup: Beyond Empirical Risk Minimization , 2017, ICLR.

[2] Quoc V. Le,et al. AutoAugment: Learning Augmentation Policies from Data , 2018, ArXiv.

[3] Augustus Odena,et al. Semi-Supervised Learning with Generative Adversarial Networks , 2016, ArXiv.

[4] Frank Hutter,et al. Fixing Weight Decay Regularization in Adam , 2017, ArXiv.

[5] Yoshua Bengio,et al. Spike-and-Slab Sparse Coding for Unsupervised Feature Discovery , 2012, ArXiv.

[6] Timo Aila,et al. Temporal Ensembling for Semi-Supervised Learning , 2016, ICLR.

[7] Wojciech Zaremba,et al. Improved Techniques for Training GANs , 2016, NIPS.

[8] Andrew Gordon Wilson,et al. Improving Consistency-Based Semi-Supervised Learning with Weight Averaging , 2018, ArXiv.

[9] Yoshua Bengio,et al. Semi-supervised Learning by Entropy Minimization , 2004, CAP.

[10] Andrew Gordon Wilson,et al. There Are Many Consistent Explanations of Unlabeled Data: Why You Should Average , 2018, ICLR.

[11] Ioannis Mitliagkas,et al. Manifold Mixup: Encouraging Meaningful On-Manifold Interpolation as a Regularizer , 2018, ArXiv.

[12] Rob Fergus,et al. Semi-Supervised Learning with Context-Conditional Generative Adversarial Networks , 2016, ArXiv.

[13] Colin Raffel,et al. Realistic Evaluation of Deep Semi-Supervised Learning Algorithms , 2018, NeurIPS.

[14] Alexander Gammerman,et al. Learning by Transduction , 1998, UAI.

[15] Zoubin Ghahramani,et al. Combining active learning and semi-supervised learning using Gaussian fields and harmonic functions , 2003, ICML 2003.

[16] Kobbi Nissim,et al. On the Generalization Properties of Differential Privacy , 2015, ArXiv.

[17] Ian Goodfellow,et al. Deep Learning with Differential Privacy , 2016, CCS.

[18] Geoffrey E. Hinton,et al. Using Deep Belief Nets to Learn Covariance Kernels for Gaussian Processes , 2007, NIPS.

[19] Xu Ji,et al. Invariant Information Distillation for Unsupervised Image Segmentation and Clustering , 2018, ArXiv.

[20] Shin Ishii,et al. Virtual Adversarial Training: A Regularization Method for Supervised and Semi-Supervised Learning , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[21] Graham W. Taylor,et al. Improved Regularization of Convolutional Neural Networks with Cutout , 2017, ArXiv.

[22] Guigang Zhang,et al. Deep Learning , 2016, Int. J. Semantic Comput..

[23] Alex Krizhevsky,et al. Learning Multiple Layers of Features from Tiny Images , 2009 .

[24] Geoffrey E. Hinton,et al. Keeping the neural networks simple by minimizing the description length of the weights , 1993, COLT '93.

[25] Max Welling,et al. Semi-supervised Learning with Deep Generative Models , 2014, NIPS.

[26] Xiaojin Zhu,et al. Semi-Supervised Learning , 2010, Encyclopedia of Machine Learning.

[27] Luca Maria Gambardella,et al. Deep, Big, Simple Neural Nets for Handwritten Digit Recognition , 2010, Neural Computation.

[28] Mikhail Belkin,et al. Laplacian Eigenmaps and Spectral Techniques for Embedding and Clustering , 2001, NIPS.

[29] Andrew Y. Ng,et al. Reading Digits in Natural Images with Unsupervised Feature Learning , 2011 .

[30] Xavier Gastaldi,et al. Shake-Shake regularization , 2017, ArXiv.

[31] Tom Minka,et al. Principled Hybrids of Generative and Discriminative Models , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[32] Harri Valpola,et al. Weight-averaged consistency targets improve semi-supervised deep learning results , 2017, ArXiv.

[33] Patrice Y. Simard,et al. Best practices for convolutional neural networks applied to visual document analysis , 2003, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..

[34] Stephen Lin,et al. Deep Metric Transfer for Label Propagation with Limited Annotated Data , 2018, 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW).

[35] Xu Ji,et al. Invariant Information Clustering for Unsupervised Image Classification and Segmentation , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[36] Guodong Zhang,et al. Three Mechanisms of Weight Decay Regularization , 2018, ICLR.

[37] Charles Blundell,et al. Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles , 2016, NIPS.

[38] Yann LeCun,et al. Stacked What-Where Auto-encoders , 2015, ArXiv.

[39] G. Brier. VERIFICATION OF FORECASTS EXPRESSED IN TERMS OF PROBABILITY , 1950 .

[40] Martín Abadi,et al. Semi-supervised Knowledge Transfer for Deep Learning from Private Training Data , 2016, ICLR.

[41] Frank Hutter,et al. Decoupled Weight Decay Regularization , 2017, ICLR.

[42] Úlfar Erlingsson,et al. Scalable Private Learning with PATE , 2018, ICLR.

[43] Hideki Nakayama,et al. Unifying semi-supervised and robust learning by mixup , 2019 .