暂无分享,去创建一个
Lawrence Carin | Dinghan Shen | Changyou Chen | Weizhu Chen | Kevin J Liang | Yufan Zhou | Weituo Hao
[1] Yu Sun,et al. ERNIE: Enhanced Representation through Knowledge Integration , 2019, ArXiv.
[2] Omer Levy,et al. BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension , 2019, ACL.
[3] Sungroh Yoon,et al. Adversarial Vertex Mixup: Toward Better Adversarially Robust Generalization , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[4] Qun Liu,et al. TinyBERT: Distilling BERT for Natural Language Understanding , 2020, EMNLP.
[5] Myle Ott,et al. fairseq: A Fast, Extensible Toolkit for Sequence Modeling , 2019, NAACL.
[6] Seyed Iman Mirzadeh,et al. Improved Knowledge Distillation via Teacher Assistant , 2020, AAAI.
[7] Quoc V. Le,et al. Unsupervised Data Augmentation for Consistency Training , 2019, NeurIPS.
[8] Ido Dagan,et al. The Sixth PASCAL Recognizing Textual Entailment Challenge , 2009, TAC.
[9] Geoffrey E. Hinton,et al. Distilling the Knowledge in a Neural Network , 2015, ArXiv.
[10] Jianfeng Gao,et al. UniLMv2: Pseudo-Masked Language Models for Unified Language Model Pre-Training , 2020, ICML.
[11] Samuel R. Bowman,et al. A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference , 2017, NAACL.
[12] Quoc V. Le,et al. ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators , 2020, ICLR.
[13] Xiaodong Liu,et al. Multi-Task Deep Neural Networks for Natural Language Understanding , 2019, ACL.
[14] Ido Dagan,et al. The Third PASCAL Recognizing Textual Entailment Challenge , 2007, ACL-PASCAL@ACL.
[15] Quoc V. Le,et al. QANet: Combining Local Convolution with Global Self-Attention for Reading Comprehension , 2018, ICLR.
[16] Kai Zou,et al. EDA: Easy Data Augmentation Techniques for Boosting Performance on Text Classification Tasks , 2019, EMNLP.
[17] Ming-Wei Chang,et al. Well-Read Students Learn Better: On the Importance of Pre-training Compact Models , 2019 .
[18] Omer Levy,et al. GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding , 2018, BlackboxNLP@EMNLP.
[19] Yiming Yang,et al. XLNet: Generalized Autoregressive Pretraining for Language Understanding , 2019, NeurIPS.
[20] Xing Wu,et al. Conditional BERT Contextual Augmentation , 2018, ICCS.
[21] Quoc V. Le,et al. BAM! Born-Again Multi-Task Networks for Natural Language Understanding , 2019, ACL.
[22] Chris Brockett,et al. Automatically Constructing a Corpus of Sentential Paraphrases , 2005, IJCNLP.
[23] Subhabrata Mukherjee,et al. Distilling BERT into Simple Neural Networks with Unlabeled Transfer Data , 2019 .
[24] Yu Cheng,et al. Patient Knowledge Distillation for BERT Model Compression , 2019, EMNLP.
[25] Omer Levy,et al. RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.
[26] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[27] Partha Talukdar,et al. Submodular Optimization-based Diverse Paraphrasing and its Effectiveness in Data Augmentation , 2019, NAACL.
[28] Ido Dagan,et al. The Third PASCAL Recognizing Textual Entailment Challenge , 2007, ACL-PASCAL@ACL.
[29] Huan Wang,et al. MKD: a Multi-Task Knowledge Distillation Approach for Pretrained Language Models , 2019 .
[30] Omer Levy,et al. SpanBERT: Improving Pre-training by Representing and Predicting Spans , 2019, TACL.
[31] Jian Zhang,et al. SQuAD: 100,000+ Questions for Machine Comprehension of Text , 2016, EMNLP.
[32] Roy Bar-Haim,et al. The Second PASCAL Recognising Textual Entailment Challenge , 2006 .
[33] Jimmy J. Lin,et al. Distilling Task-Specific Knowledge from BERT into Simple Neural Networks , 2019, ArXiv.
[34] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[35] Dongdong Wang,et al. Neural Networks Are More Productive Teachers Than Human Raters: Active Mixup for Data-Efficient Knowledge Distillation From a Blackbox Model , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[36] Yiming Yang,et al. MobileBERT: a Compact Task-Agnostic BERT for Resource-Limited Devices , 2020, ACL.
[37] Thomas Wolf,et al. DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter , 2019, ArXiv.
[38] Seong Joon Oh,et al. CutMix: Regularization Strategy to Train Strong Classifiers With Localizable Features , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[39] Yang Song,et al. Extreme Language Model Compression with Optimal Subwords and Shared Projections , 2019, ArXiv.
[40] Jonathan Baxter,et al. A Model of Inductive Bias Learning , 2000, J. Artif. Intell. Res..
[41] Hongyi Zhang,et al. mixup: Beyond Empirical Risk Minimization , 2017, ICLR.
[42] Christopher Potts,et al. Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank , 2013, EMNLP.
[43] Subhabrata Mukherjee,et al. Distilling Transformers into Simple Neural Networks with Unlabeled Transfer Data , 2019, ArXiv.
[44] Hongyu Guo,et al. Augmenting Data with Mixup for Sentence Classification: An Empirical Study , 2019, ArXiv.