Robust Fine-Tuning of Deep Neural Networks with Hessian-based Generalization Guarantees
暂无分享,去创建一个
[1] P. Chaudhari,et al. Does the Data Induce Capacity Control in Deep Learning? , 2021, ICML.
[2] Jong Wook Kim,et al. Robust fine-tuning of zero-shot models , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[3] Hwanjun Song,et al. Learning From Noisy Labels With Deep Neural Networks: A Survey , 2020, IEEE Transactions on Neural Networks and Learning Systems.
[4] Edgar Dobriban,et al. Learning Augmentation Distributions using Transformed Risk Minimization , 2021, ArXiv.
[5] Dongyue Li,et al. Improved Regularization and Robustness for Fine-tuning in Neural Networks , 2021, NeurIPS.
[6] Pierre Alquier,et al. User-friendly introduction to PAC-Bayes bounds , 2021, ArXiv.
[7] Amir Globerson,et al. A Theoretical Analysis of Fine-tuning with Linear Teachers , 2021, NeurIPS.
[8] Qi Lei,et al. Near-Optimal Linear Regression under Distribution Shift , 2021, ICML.
[9] Hossein Azizpour,et al. Generalized Jensen-Shannon Divergence Loss for Learning with Noisy Labels , 2021, NeurIPS.
[10] Sanjeev Arora,et al. Technical perspective: Why don't today's deep nets overfit to their training data? , 2021, Commun. ACM.
[11] B. Recht,et al. Patterns, predictions, and actions: A story about machine learning , 2021, ArXiv.
[12] Renjie Liao,et al. A PAC-Bayesian Approach to Generalization Bounds for Graph Neural Networks , 2020, ICLR.
[13] Jingfei Du,et al. Supervised Contrastive Learning for Pre-trained Language Model Fine-tuning , 2020, ICLR.
[14] S. Gelly,et al. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale , 2020, ICLR.
[15] Ariel Kleiner,et al. Sharpness-Aware Minimization for Efficiently Improving Generalization , 2020, ICLR.
[16] Gintare Karolina Dziugaite,et al. On the role of data in PAC-Bayes bounds , 2020, ArXiv.
[17] Massimiliano Pontil,et al. Distance-Based Regularisation of Deep Networks for Fine-Tuning , 2020, ICLR.
[18] Shivani Agarwal,et al. Learning from Noisy Labels with No Change to the Training Process , 2021, ICML.
[19] Eli Upfal,et al. Adversarial Multi Class Learning under Weak Supervision with Performance Guarantees , 2021, ICML.
[20] Weijie J. Su,et al. Analysis of Information Transfer from Heterogeneous Sources via Precise High-dimensional Asymptotics , 2020, 2010.11750.
[21] Rong Ge,et al. Dissecting Hessian: Understanding Common Structure of Hessian in Neural Networks , 2020, ArXiv.
[22] Behnam Neyshabur,et al. What is being transferred in transfer learning? , 2020, NeurIPS.
[23] Sheng Liu,et al. Early-Learning Regularization Prevents Memorization of Noisy Labels , 2020, NeurIPS.
[24] James Bailey,et al. Normalized Loss Functions for Deep Learning with Noisy Labels , 2020, ICML.
[25] Michael I. Jordan,et al. On the Theory of Transfer Learning: The Importance of Task Diversity , 2020, NeurIPS.
[26] Gang Niu,et al. Dual T: Reducing Estimation Error for Transition Matrix in Label-noise Learning , 2020, NeurIPS.
[27] Mark Chen,et al. Language Models are Few-Shot Learners , 2020, NeurIPS.
[28] Subhransu Maji,et al. Exploring and Predicting Transferability across NLP Tasks , 2020, EMNLP.
[29] Sen Wu,et al. On the Generalization Effects of Linear Transformations in Data Augmentation , 2020, ICML.
[30] Colin Wei,et al. Improved Sample Complexities for Deep Neural Networks and Robust Classification via an All-Layer Margin , 2020, ICLR.
[31] Sen Wu,et al. Understanding and Improving Information Transfer in Multi-Task Learning , 2020, ICLR.
[32] G. A. Young,et al. High‐dimensional Statistics: A Non‐asymptotic Viewpoint, Martin J.Wainwright, Cambridge University Press, 2019, xvii 552 pages, £57.99, hardback ISBN: 978‐1‐1084‐9802‐9 , 2020, International Statistical Review.
[33] Aditya Krishna Menon,et al. Does label smoothing mitigate label noise? , 2020, ICML.
[34] Chao Zhang,et al. Self-Adaptive Training: beyond Empirical Risk Minimization , 2020, NeurIPS.
[35] David Berthelot,et al. FixMatch: Simplifying Semi-Supervised Learning with Consistency and Confidence , 2020, NeurIPS.
[36] Michael W. Mahoney,et al. PyHessian: Neural Networks Through the Lens of the Hessian , 2019, 2020 IEEE International Conference on Big Data (Big Data).
[37] Hossein Mobahi,et al. Fantastic Generalization Measures and Where to Find Them , 2019, ICLR.
[38] Colin Raffel,et al. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer , 2019, J. Mach. Learn. Res..
[39] Philip M. Long,et al. Generalization bounds for deep convolutional neural networks , 2019, ICLR.
[40] Samet Oymak,et al. Gradient Descent with Early Stopping is Provably Robust to Label Noise for Overparameterized Neural Networks , 2019, AISTATS.
[41] Masashi Sugiyama,et al. Normalized Flat Minima: Exploring Scale Invariant Definition of Flat Minima for Neural Networks using PAC-Bayesian Analysis , 2019, ICML.
[42] Steve Hanneke,et al. On the Value of Target Data in Transfer Learning , 2020, NeurIPS.
[43] Omer Levy,et al. RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.
[44] Takuya Akiba,et al. Optuna: A Next-generation Hyperparameter Optimization Framework , 2019, KDD.
[45] Geoffrey E. Hinton,et al. When Does Label Smoothing Help? , 2019, NeurIPS.
[46] Gang Niu,et al. Are Anchor Points Really Indispensable in Label-Noise Learning? , 2019, NeurIPS.
[47] J. Zico Kolter,et al. Deterministic PAC-Bayesian generalization bounds for deep networks via generalizing noise-resilience , 2019, ICLR.
[48] Michael I. Jordan,et al. A Short Note on Concentration Inequalities for Random Vectors with SubGaussian Norm , 2019, ArXiv.
[49] Xiaodong Liu,et al. Multi-Task Deep Neural Networks for Natural Language Understanding , 2019, ACL.
[50] Shankar Krishnan,et al. An Investigation into Neural Net Optimization via Hessian Eigenvalue Density , 2019, ICML.
[51] Haoyi Xiong,et al. DELTA: DEep Learning Transfer using Feature Map with Attention for Convolutional Networks , 2019, ICLR.
[52] Vardan Papyan,et al. Measurements of Three-Level Hierarchical Structure in the Outliers in the Spectrum of Deepnet Hessians , 2019, ICML.
[53] Benjamin Guedj,et al. A Primer on PAC-Bayesian Learning , 2019, ICML 2019.
[54] J. Zico Kolter,et al. Generalization in Deep Networks: The Role of Distance from Initialization , 2019, ArXiv.
[55] Bo Wang,et al. Moment Matching for Multi-Source Domain Adaptation , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[56] Yann LeCun,et al. Towards Understanding the Role of Over-Parametrization in Generalization of Neural Networks , 2018, ArXiv.
[57] Ryan P. Adams,et al. Non-vacuous Generalization Bounds at the ImageNet Scale: a PAC-Bayesian Compression Approach , 2018, ICLR.
[58] Peter L. Bartlett,et al. Nearly-tight VC-dimension and Pseudodimension Bounds for Piecewise Linear Neural Networks , 2017, J. Mach. Learn. Res..
[59] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[60] Vardan Papyan,et al. The Full Spectrum of Deepnet Hessians at Scale: Dynamics with SGD Training and Sample Size. , 2018 .
[61] Omer Levy,et al. GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding , 2018, BlackboxNLP@EMNLP.
[62] Yi Zhang,et al. Stronger generalization bounds for deep nets via a compression approach , 2018, ICML.
[63] Xuhong Li,et al. Explicit Inductive Bias for Transfer Learning with Convolutional Networks , 2018, ICML.
[64] Pierre Vandergheynst,et al. PAC-BAYESIAN MARGIN BOUNDS FOR CONVOLUTIONAL NEURAL NETWORKS , 2018 .
[65] Anima Anandkumar,et al. Learning From Noisy Singly-labeled Data , 2017, ICLR.
[66] Hongyi Zhang,et al. mixup: Beyond Empirical Risk Minimization , 2017, ICLR.
[67] David A. McAllester,et al. A PAC-Bayesian Approach to Spectrally-Normalized Margin Bounds for Neural Networks , 2017, ICLR.
[68] Rico Sennrich,et al. Regularization techniques for fine-tuning in neural machine translation , 2017, EMNLP.
[69] Matus Telgarsky,et al. Spectrally-normalized margin bounds for neural networks , 2017, NIPS.
[70] Sebastian Ruder,et al. An Overview of Multi-Task Learning in Deep Neural Networks , 2017, ArXiv.
[71] Gintare Karolina Dziugaite,et al. Computing Nonvacuous Generalization Bounds for Deep (Stochastic) Neural Networks with Many More Parameters than Training Data , 2017, UAI.
[72] Sergey Levine,et al. Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.
[73] Samy Bengio,et al. Understanding deep learning requires rethinking generalization , 2016, ICLR.
[74] Richard Nock,et al. Making Deep Neural Networks Robust to Label Noise: A Loss Correction Approach , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[75] Nagarajan Natarajan,et al. Cost-Sensitive Learning with Noisy Labels , 2017, J. Mach. Learn. Res..
[76] Yann LeCun,et al. Eigenvalues of the Hessian in Deep Learning: Singularity and Beyond , 2016, 1611.07476.
[77] Kevin Gimpel,et al. Gaussian Error Linear Units (GELUs) , 2016 .
[78] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[79] Dacheng Tao,et al. Classification with Noisy Labels by Importance Reweighting , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[80] Joel A. Tropp,et al. An Introduction to Matrix Concentration Inequalities , 2015, Found. Trends Mach. Learn..
[81] Nagarajan Natarajan,et al. Learning with Noisy Labels , 2013, NIPS.
[82] David A. McAllester. A PAC-Bayesian Tutorial with A Dropout Bound , 2013, ArXiv.
[83] Koby Crammer,et al. A theory of learning from different domains , 2010, Machine Learning.
[84] Alex Krizhevsky,et al. Learning Multiple Layers of Features from Tiny Images , 2009 .
[85] Shai Ben-David,et al. A notion of task relatedness yielding provable multiple-task learning guarantees , 2008, Machine Learning.
[86] O. Catoni. PAC-BAYESIAN SUPERVISED CLASSIFICATION: The Thermodynamics of Statistical Learning , 2007, 0712.0248.
[87] Y. Yao,et al. On Early Stopping in Gradient Descent Learning , 2007 .
[88] Koby Crammer,et al. Learning from Multiple Sources , 2006, NIPS.
[89] D. Angluin,et al. Learning From Noisy Examples , 1988, Machine Learning.
[90] Shai Ben-David,et al. A theoretical framework for learning from a pool of disparate data sources , 2002, KDD.
[91] Shang-Hua Teng,et al. Smoothed analysis of algorithms: why the simplex algorithm usually takes polynomial time , 2001, STOC '01.
[92] Sanjoy Dasgupta,et al. PAC Generalization Bounds for Co-training , 2001, NIPS.
[93] David A. McAllester. PAC-Bayesian model averaging , 1999, COLT '99.
[94] David A. McAllester. Some PAC-Bayesian Theorems , 1998, COLT' 98.