暂无分享,去创建一个
Xipeng Qiu | Xuanjing Huang | Yige Xu | Ligao Zhou | Xipeng Qiu | Xuanjing Huang | Yige Xu | L. Zhou
[1] Geoffrey E. Hinton,et al. Distilling the Knowledge in a Neural Network , 2015, ArXiv.
[2] Xiaodong Liu,et al. Multi-Task Deep Neural Networks for Natural Language Understanding , 2019, ACL.
[3] Qun Liu,et al. TinyBERT: Distilling BERT for Natural Language Understanding , 2020, EMNLP.
[4] Yiming Yang,et al. XLNet: Generalized Autoregressive Pretraining for Language Understanding , 2019, NeurIPS.
[5] Timo Aila,et al. Temporal Ensembling for Semi-Supervised Learning , 2016, ICLR.
[6] Iain Murray,et al. BERT and PALs: Projected Attention Layers for Efficient Adaptation in Multi-Task Learning , 2019, ICML.
[7] Omer Levy,et al. RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.
[8] Harri Valpola,et al. Weight-averaged consistency targets improve semi-supervised deep learning results , 2017, ArXiv.
[9] Christopher Potts,et al. A large annotated corpus for learning natural language inference , 2015, EMNLP.
[10] Jian Zhang,et al. SQuAD: 100,000+ Questions for Machine Comprehension of Text , 2016, EMNLP.
[11] Colin Raffel,et al. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer , 2019, J. Mach. Learn. Res..
[12] Mona Attariyan,et al. Parameter-Efficient Transfer Learning for NLP , 2019, ICML.
[13] Thomas Wolf,et al. DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter , 2019, ArXiv.
[14] Samuel R. Bowman,et al. A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference , 2017, NAACL.
[15] Michael I. Jordan,et al. Advances in Neural Information Processing Systems 30 , 1995 .
[16] Boris Polyak,et al. Acceleration of stochastic approximation by averaging , 1992 .
[17] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[18] Xiaodong Liu,et al. Unified Language Model Pre-training for Natural Language Understanding and Generation , 2019, NeurIPS.
[19] Sebastian Ruder,et al. Universal Language Model Fine-tuning for Text Classification , 2018, ACL.
[20] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[21] Xuanjing Huang,et al. How to Fine-Tune BERT for Text Classification? , 2019, CCL.
[22] Noah A. Smith,et al. To Tune or Not to Tune? Adapting Pretrained Representations to Diverse Tasks , 2019, RepL4NLP@ACL.
[23] Christopher Potts,et al. Learning Word Vectors for Sentiment Analysis , 2011, ACL.