暂无分享,去创建一个
Ankit Singh Rawat | Harikrishna Narasimhan | Sashank J. Reddi | Andrew Cotter | Aditya Krishna Menon | Yichen Zhou | A. Menon | A. Rawat | Andrew Cotter | H. Narasimhan | Yichen Zhou
[1] Kaiming He,et al. Data Distillation: Towards Omni-Supervised Learning , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[2] Balas K. Natarajan,et al. On learning sets and functions , 2004, Machine Learning.
[3] Yann LeCun,et al. Towards Understanding the Role of Over-Parametrization in Generalization of Neural Networks , 2018, ArXiv.
[4] Leo Breiman,et al. Bagging Predictors , 1996, Machine Learning.
[5] Ed H. Chi,et al. Understanding and Improving Knowledge Distillation , 2020, ArXiv.
[6] Zachary Chase Lipton,et al. Born Again Neural Networks , 2018, ICML.
[7] Geoffrey E. Hinton,et al. Distilling the Knowledge in a Neural Network , 2015, ArXiv.
[8] Boaz Barak,et al. Deep double descent: where bigger models and more data hurt , 2019, ICLR.
[9] Hossein Mobahi,et al. Self-Distillation Amplifies Regularization in Hilbert Space , 2020, NeurIPS.
[10] Junsong Yuan,et al. Rethinking Soft Labels for Knowledge Distillation: A Bias-Variance Tradeoff Perspective , 2021, ICLR.
[11] Levent Sagun,et al. The jamming transition as a paradigm to understand the loss landscape of deep neural networks , 2018, Physical review. E.
[12] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[13] Shai Ben-David,et al. Multiclass Learnability and the ERM principle , 2011, COLT.
[14] Bernhard Schölkopf,et al. Unifying distillation and privileged information , 2015, ICLR.
[15] Alex Krizhevsky,et al. Learning Multiple Layers of Features from Tiny Images , 2009 .
[16] Bin Dong,et al. Distillation ≈ Early Stopping? Harvesting Dark Knowledge Utilizing Anisotropic Information Retrieval For Overparameterized Neural Network , 2019, ArXiv.
[17] Michael S. Bernstein,et al. ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.
[18] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[19] Andrew M. Saxe,et al. High-dimensional dynamics of generalization error in neural networks , 2017, Neural Networks.
[20] Tomaso Poggio,et al. Double descent in the condition number , 2019, ArXiv.
[21] Ankit Singh Rawat,et al. Why distillation helps: a statistical perspective , 2020, ArXiv.
[22] Christos Thrampoulidis,et al. Provable Benefits of Overparameterization in Model Compression: From Double Descent to Pruning Neural Networks , 2020, AAAI.
[23] Mert R. Sabuncu,et al. Self-Distillation as Instance-Specific Label Smoothing , 2020, NeurIPS.
[24] Mikhail Belkin,et al. Classification vs regression in overparameterized regimes: Does the loss function matter? , 2020, J. Mach. Learn. Res..
[25] Tomaso Poggio,et al. Complexity control by gradient descent in deep networks , 2020, Nature Communications.
[26] Mikhail Belkin,et al. Reconciling modern machine-learning practice and the classical bias–variance trade-off , 2018, Proceedings of the National Academy of Sciences.
[27] Andrew Y. Ng,et al. Reading Digits in Natural Images with Unsupervised Feature Learning , 2011 .
[28] Frank Hutter,et al. A Downsampled Variant of ImageNet as an Alternative to the CIFAR datasets , 2017, ArXiv.