Evaluating natural language processing models with generalization metrics that do not need access to any training or testing data
暂无分享,去创建一个
Michael W. Mahoney | Charles H. Martin | Yaoqing Yang | K. Ramchandran | Ryan Theisen | Liam Hodgkinson | Joseph Gonzalez
[1] Kannan Ramchandran,et al. Taxonomizing local versus global structure in neural network loss landscapes , 2021, NeurIPS.
[2] Weizhe Yuan,et al. BARTScore: Evaluating Generated Text as Text Generation , 2021, NeurIPS.
[3] Michael W. Mahoney,et al. Post-mortem on a deep learning contest: a Simpson's paradox and the complementary roles of scale metrics versus shape metrics , 2021, ArXiv.
[4] Xinyu Gong,et al. Neural Architecture Search on ImageNet in Four GPU Hours: A Theoretically Inspired Perspective , 2021, ICLR.
[5] Samy Bengio,et al. Understanding deep learning (still) requires rethinking generalization , 2021, Commun. ACM.
[6] Hossein Mobahi,et al. NeurIPS 2020 Competition: Predicting Generalization in Deep Learning , 2020, ArXiv.
[7] Ioannis Mitliagkas,et al. In Search of Robust Measures of Generalization , 2020, NeurIPS.
[8] Ariel Kleiner,et al. Sharpness-Aware Minimization for Efficiently Improving Generalization , 2020, ICLR.
[9] Yelong Shen,et al. A Simple but Tough-to-Beat Data Augmentation Approach for Natural Language Understanding and Generation , 2020, ArXiv.
[10] Michael W. Mahoney,et al. Boundary thickness and robustness in learning models , 2020, NeurIPS.
[11] Michael W. Mahoney,et al. Multiplicative noise and heavy tails in stochastic optimization , 2020, ICML.
[12] Guokun Lai,et al. Funnel-Transformer: Filtering out Sequential Redundancy for Efficient Language Processing , 2020, NeurIPS.
[13] Colin Wei,et al. Improved Sample Complexities for Deep Neural Networks and Robust Classification via an All-Layer Margin , 2020, ICLR.
[14] Jiawei Han,et al. Understanding the Difficulty of Training Transformers , 2020, EMNLP.
[15] Michael W. Mahoney,et al. Predicting trends in the quality of state-of-the-art neural networks without access to training or testing data , 2020, Nature Communications.
[16] 知秀 柴田. 5分で分かる!? 有名論文ナナメ読み:Jacob Devlin et al. : BERT : Pre-training of Deep Bidirectional Transformers for Language Understanding , 2020 .
[17] Tie-Yan Liu,et al. On Layer Normalization in the Transformer Architecture , 2020, ICML.
[18] B. Lecouteux,et al. FlauBERT: Unsupervised Language Model Pre-training for French , 2019, LREC.
[19] Hossein Mobahi,et al. Fantastic Generalization Measures and Where to Find Them , 2019, ICLR.
[20] Boaz Barak,et al. Deep double descent: where bigger models and more data hurt , 2019, ICLR.
[21] Jianfeng Gao,et al. DIALOGPT : Large-Scale Generative Pre-training for Conversational Response Generation , 2019, ACL.
[22] Colin Raffel,et al. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer , 2019, J. Mach. Learn. Res..
[23] Lysandre Debut,et al. HuggingFace's Transformers: State-of-the-art Natural Language Processing , 2019, ArXiv.
[24] Kevin Gimpel,et al. ALBERT: A Lite BERT for Self-supervised Learning of Language Representations , 2019, ICLR.
[25] Ming-Wei Chang,et al. Well-Read Students Learn Better: On the Importance of Pre-training Compact Models , 2019 .
[26] Balaji Lakshminarayanan,et al. Deep Ensembles: A Loss Landscape Perspective , 2019, ArXiv.
[27] Omer Levy,et al. RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.
[28] Vitaly Feldman,et al. Does learning require memorization? a short tale about a long tail , 2019, STOC.
[29] Kurt Keutzer,et al. HAWQ: Hessian AWare Quantization of Neural Networks With Mixed-Precision , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[30] Kilian Q. Weinberger,et al. BERTScore: Evaluating Text Generation with BERT , 2019, ICLR.
[31] J. Zico Kolter,et al. Uniform convergence may be unable to explain generalization in deep learning , 2019, NeurIPS.
[32] Michael W. Mahoney,et al. Heavy-Tailed Universality Predicts Trends in Test Accuracies for Very Large Pre-Trained Deep Neural Networks , 2019, SDM.
[33] Michael W. Mahoney,et al. Traditional and Heavy-Tailed Self Regularization in Neural Network Models , 2019, ICML.
[34] Michael W. Mahoney,et al. Implicit Self-Regularization in Deep Neural Networks: Evidence from Random Matrix Theory and Implications for Learning , 2018, J. Mach. Learn. Res..
[35] Hossein Mobahi,et al. Predicting the Generalization Gap in Deep Networks with Margin Distributions , 2018, ICLR.
[36] Myle Ott,et al. Understanding Back-Translation at Scale , 2018, EMNLP.
[37] Vitaly Shmatikov,et al. How To Backdoor Federated Learning , 2018, AISTATS.
[38] Myle Ott,et al. Scaling Neural Machine Translation , 2018, WMT.
[39] Hossein Mobahi,et al. Large Margin Deep Networks for Classification , 2018, NeurIPS.
[40] Andrew Gordon Wilson,et al. Averaging Weights Leads to Wider Optima and Better Generalization , 2018, UAI.
[41] Michael Carbin,et al. The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks , 2018, ICLR.
[42] Andrew Gordon Wilson,et al. Loss Surfaces, Mode Connectivity, and Fast Ensembling of DNNs , 2018, NeurIPS.
[43] Pierre Vandergheynst,et al. PAC-BAYESIAN MARGIN BOUNDS FOR CONVOLUTIONAL NEURAL NETWORKS , 2018 .
[44] Michael W. Mahoney,et al. Rethinking generalization requires revisiting old ideas: statistical mechanics approaches and complex learning behavior , 2017, ArXiv.
[45] David A. McAllester,et al. A PAC-Bayesian Approach to Spectrally-Normalized Margin Bounds for Neural Networks , 2017, ICLR.
[46] Nathan Srebro,et al. Exploring Generalization in Deep Learning , 2017, NIPS.
[47] Matus Telgarsky,et al. Spectrally-normalized margin bounds for neural networks , 2017, NIPS.
[48] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[49] Samuel R. Bowman,et al. A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference , 2017, NAACL.
[50] Richard Socher,et al. Pointer Sentinel Mixture Models , 2016, ICLR.
[51] Lewis D. Griffin,et al. A Boundary Tilting Persepective on the Phenomenon of Adversarial Examples , 2016, ArXiv.
[52] Ryota Tomioka,et al. Norm-Based Capacity Control in Neural Networks , 2015, COLT.
[53] Philipp Koehn,et al. Findings of the 2014 Workshop on Statistical Machine Translation , 2014, WMT@ACL.
[54] D. Plenz,et al. powerlaw: A Python Package for Analysis of Heavy-Tailed Distributions , 2013, PloS one.
[55] Mark E. J. Newman,et al. Power-Law Distributions in Empirical Data , 2007, SIAM Rev..
[56] Salim Roukos,et al. Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.
[57] David A. McAllester. PAC-Bayesian model averaging , 1999, COLT '99.
[58] Rajiv Khanna,et al. Generalization Properties of Stochastic Optimizers via Trajectory Analysis , 2021, ArXiv.
[59] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[60] Ilya Sutskever,et al. Language Models are Unsupervised Multitask Learners , 2019 .
[61] Hang Li,et al. Deep learning for natural language processing: advantages and challenges , 2018 .
[62] Marcello Federico,et al. Report on the 11th IWSLT evaluation campaign , 2014, IWSLT.
[63] Janet E. Rogers,et al. Orthogonal Distance Regression ∗ , 2009 .