There Are Many Consistent Explanations of Unlabeled Data: Why You Should Average
暂无分享,去创建一个
Andrew Gordon Wilson | Marc Finzi | Pavel Izmailov | Ben Athiwaratkun | Pavel Izmailov | Marc Finzi | Ben Athiwaratkun | A. Wilson
[1] Tolga Tasdizen,et al. Regularization With Stochastic Transformations and Perturbations for Deep Semi-Supervised Learning , 2016, NIPS.
[2] Colin Raffel,et al. Realistic Evaluation of Semi-Supervised Learning Algorithms , 2018, ICLR.
[3] Jorge Nocedal,et al. On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima , 2016, ICLR.
[4] Philip Bachman,et al. Learning with Pseudo-Ensembles , 2014, NIPS.
[5] Stefano Ermon,et al. A DIRT-T Approach to Unsupervised Domain Adaptation , 2018, ICLR.
[6] Bo Zhang,et al. Smooth Neighbors on Teacher Graphs for Semi-Supervised Learning , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[7] Stefan Winkler,et al. The Unusual Effectiveness of Averaging in GAN Training , 2018, ICLR.
[8] Timo Aila,et al. Temporal Ensembling for Semi-Supervised Learning , 2016, ICLR.
[9] Zoubin Ghahramani,et al. Combining active learning and semi-supervised learning using Gaussian fields and harmonic functions , 2003, ICML 2003.
[10] Frank Hutter,et al. SGDR: Stochastic Gradient Descent with Warm Restarts , 2016, ICLR.
[11] Ethan Dyer,et al. Gradient Descent Happens in a Tiny Subspace , 2018, ArXiv.
[12] Xavier Gastaldi,et al. Shake-Shake regularization , 2017, ArXiv.
[13] Harri Valpola,et al. Weight-averaged consistency targets improve semi-supervised deep learning results , 2017, ArXiv.
[14] Yi Zhang,et al. Stronger generalization bounds for deep nets via a compression approach , 2018, ICML.
[15] Jürgen Schmidhuber,et al. Flat Minima , 1997, Neural Computation.
[16] Il-Chul Moon,et al. Adversarial Dropout for Supervised and Semi-supervised Learning , 2017, AAAI.
[17] Geoffrey French,et al. Self-ensembling for visual domain adaptation , 2017, ICLR.
[18] Colin Raffel,et al. Realistic Evaluation of Deep Semi-Supervised Learning Algorithms , 2018, NeurIPS.
[19] Razvan Pascanu,et al. Sharp Minima Can Generalize For Deep Nets , 2017, ICML.
[20] Shin Ishii,et al. Virtual Adversarial Training: A Regularization Method for Supervised and Semi-Supervised Learning , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[21] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[22] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[23] Nitish Srivastava,et al. Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..
[24] Alex Krizhevsky,et al. Learning Multiple Layers of Features from Tiny Images , 2009 .
[25] Oriol Vinyals,et al. Qualitatively characterizing neural network optimization problems , 2014, ICLR.
[26] Stefano Soatto,et al. Entropy-SGD: biasing gradient descent into wide valleys , 2016, ICLR.
[27] Yann Dauphin,et al. Empirical Analysis of the Hessian of Over-Parametrized Neural Networks , 2017, ICLR.
[28] Antonio Torralba,et al. Ieee Transactions on Pattern Analysis and Machine Intelligence 1 80 Million Tiny Images: a Large Dataset for Non-parametric Object and Scene Recognition , 2022 .
[29] Yoshua Bengio,et al. Generative Adversarial Nets , 2014, NIPS.
[30] David M. Blei,et al. Stochastic Gradient Descent as Approximate Bayesian Inference , 2017, J. Mach. Learn. Res..
[31] Jascha Sohl-Dickstein,et al. Sensitivity and Generalization in Neural Networks: an Empirical Study , 2018, ICLR.
[32] Guillermo Sapiro,et al. Robust Large Margin Deep Neural Networks , 2016, IEEE Transactions on Signal Processing.
[33] Andrew Gordon Wilson,et al. Averaging Weights Leads to Wider Optima and Better Generalization , 2018, UAI.
[34] Frank Hutter,et al. SGDR: Stochastic Gradient Descent with Restarts , 2016, ArXiv.
[35] Sivan Toledo,et al. Randomized algorithms for estimating the trace of an implicit symmetric positive semi-definite matrix , 2011, JACM.