Simple and Effective Regularization Methods for Training on Noisily Labeled Data with Generalization Guarantee
暂无分享,去创建一个
Zhiyuan Li | Wei Hu | Dingli Yu | Wei Hu | Zhiyuan Li | Dingli Yu
[1] Greg Yang,et al. Scaling Limits of Wide Neural Networks with Weight Sharing: Gaussian Process Behavior, Gradient Independence, and Neural Tangent Kernel Derivation , 2019, ArXiv.
[2] Xingrui Yu,et al. How does Disagreement Help Generalization against Label Corruption? , 2019, ICML.
[3] Martin J. Wainwright,et al. Early Stopping for Kernel Boosting Algorithms: A General Analysis With Localized Complexities , 2017, IEEE Transactions on Information Theory.
[4] Yuan Cao,et al. Stochastic Gradient Descent Optimizes Over-parameterized Deep ReLU Networks , 2018, ArXiv.
[5] Abhinav Gupta,et al. Learning from Noisy Large-Scale Datasets with Minimal Supervision , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[6] Yuanzhi Li,et al. Learning and Generalization in Overparameterized Neural Networks, Going Beyond Two Layers , 2018, NeurIPS.
[7] Martin J. Wainwright,et al. Early stopping for non-parametric regression: An optimal data-dependent stopping rule , 2011, 2011 49th Annual Allerton Conference on Communication, Control, and Computing (Allerton).
[8] Li Fei-Fei,et al. MentorNet: Learning Data-Driven Curriculum for Very Deep Neural Networks on Corrupted Labels , 2017, ICML.
[9] Wei Hu,et al. Algorithmic Regularization in Learning Deep Homogeneous Models: Layers are Automatically Balanced , 2018, NeurIPS.
[10] Mert R. Sabuncu,et al. Generalized Cross Entropy Loss for Training Deep Neural Networks with Noisy Labels , 2018, NeurIPS.
[11] Arthur Jacot,et al. Neural tangent kernel: convergence and generalization in neural networks (invited paper) , 2018, NeurIPS.
[12] Francis Bach,et al. On Lazy Training in Differentiable Programming , 2018, NeurIPS.
[13] Aritra Ghosh,et al. Robust Loss Functions under Label Noise for Deep Neural Networks , 2017, AAAI.
[14] Quanquan Gu,et al. Generalization Error Bounds of Gradient Descent for Learning Over-Parameterized Deep ReLU Networks , 2019, AAAI.
[15] Samy Bengio,et al. Understanding deep learning requires rethinking generalization , 2016, ICLR.
[16] Yuanzhi Li,et al. Learning Overparameterized Neural Networks via Stochastic Gradient Descent on Structured Data , 2018, NeurIPS.
[17] Xingrui Yu,et al. Co-teaching: Robust training of deep neural networks with extremely noisy labels , 2018, NeurIPS.
[18] Barnabás Póczos,et al. Gradient Descent Provably Optimizes Over-parameterized Neural Networks , 2018, ICLR.
[19] Shai Shalev-Shwartz,et al. Decoupling "when to update" from "how to update" , 2017, NIPS.
[20] Lorenzo Rosasco,et al. Spectral Algorithms for Supervised Learning , 2008, Neural Computation.
[21] Ameet Talwalkar,et al. Foundations of Machine Learning , 2012, Adaptive computation and machine learning.
[22] Dacheng Tao,et al. Classification with Noisy Labels by Importance Reweighting , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[23] Yuanzhi Li,et al. A Convergence Theory for Deep Learning via Over-Parameterization , 2018, ICML.
[24] Tailin Wu,et al. Learning with Confident Examples: Rank Pruning for Robust Classification with Noisy Labels , 2017, UAI.
[25] Michael S. Bernstein,et al. Embracing Error to Enable Rapid Crowdsourcing , 2016, CHI.
[26] Sham M. Kakade,et al. A tail inequality for quadratic forms of subgaussian random vectors , 2011, ArXiv.
[27] Yann LeCun,et al. Towards Understanding the Role of Over-Parametrization in Generalization of Neural Networks , 2018, ArXiv.
[28] Ruosong Wang,et al. Fine-Grained Analysis of Optimization and Generalization for Overparameterized Two-Layer Neural Networks , 2019, ICML.
[29] Nir Shavit,et al. Deep Learning is Robust to Massive Label Noise , 2017, ArXiv.
[30] J. Zico Kolter,et al. Generalization in Deep Networks: The Role of Distance from Initialization , 2019, ArXiv.
[31] Lorenzo Rosasco,et al. On regularization algorithms in learning theory , 2007, J. Complex..
[32] Peter L. Bartlett,et al. Rademacher and Gaussian Complexities: Risk Bounds and Structural Results , 2003, J. Mach. Learn. Res..
[33] Geoffrey E. Hinton,et al. Who Said What: Modeling Individual Labelers Improves Classification , 2017, AAAI.
[34] Liwei Wang,et al. Gradient Descent Finds Global Minima of Deep Neural Networks , 2018, ICML.
[35] Bin Yang,et al. Learning to Reweight Examples for Robust Deep Learning , 2018, ICML.
[36] Joan Bruna,et al. Training Convolutional Networks with Noisy Labels , 2014, ICLR 2014.
[37] Francis Bach,et al. A Note on Lazy Training in Supervised Differentiable Programming , 2018, ArXiv.
[38] Yuan Cao,et al. A Generalization Theory of Gradient Descent for Learning Over-parameterized Deep ReLU Networks , 2019, ArXiv.
[39] Ruosong Wang,et al. On Exact Computation with an Infinitely Wide Neural Net , 2019, NeurIPS.
[40] Ruosong Wang,et al. Graph Neural Tangent Kernel: Fusing Graph Neural Networks with Graph Kernels , 2019, NeurIPS.
[41] Jaehoon Lee,et al. Wide neural networks of any depth evolve as linear models under gradient descent , 2019, NeurIPS.
[42] Samet Oymak,et al. Gradient Descent with Early Stopping is Provably Robust to Label Noise for Overparameterized Neural Networks , 2019, AISTATS.