暂无分享,去创建一个
[1] Forrest N. Iandola,et al. SqueezeBERT: What can computer vision teach NLP about efficient neural networks? , 2020, SUSTAINLP.
[2] Furu Wei,et al. MiniLM: Deep Self-Attention Distillation for Task-Agnostic Compression of Pre-Trained Transformers , 2020, NeurIPS.
[3] Omer Levy,et al. BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension , 2019, ACL.
[4] Colin Raffel,et al. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer , 2019, J. Mach. Learn. Res..
[5] Rich Caruana,et al. Model compression , 2006, KDD '06.
[6] Omer Levy,et al. GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding , 2018, BlackboxNLP@EMNLP.
[7] Bo Chen,et al. Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[8] Salim Roukos,et al. Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.
[9] Christopher D. Manning,et al. Get To The Point: Summarization with Pointer-Generator Networks , 2017, ACL.
[10] Maria Leonor Pacheco,et al. of the Association for Computational Linguistics: , 2001 .
[11] Edouard Grave,et al. Reducing Transformer Depth on Demand with Structured Dropout , 2019, ICLR.
[12] Yao Zhao,et al. PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization , 2020, ICML.
[13] Jungo Kasai,et al. Deep Encoder, Shallow Decoder: Reevaluating the Speed-Quality Tradeoff in Machine Translation , 2020, ArXiv.
[14] Meng Sun,et al. Baidu Neural Machine Translation Systems for WMT19 , 2019, WMT.
[15] Huaiyu Zhu. On Information and Sufficiency , 1997 .
[16] Furu Wei,et al. BERT-of-Theseus: Compressing BERT by Progressive Module Replacing , 2020, EMNLP.
[17] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[18] Alexander M. Rush,et al. Sequence-Level Knowledge Distillation , 2016, EMNLP.
[19] Natalia Gimelshein,et al. PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.
[20] Mirella Lapata,et al. Noisy Self-Knowledge Distillation for Text Summarization , 2020, ArXiv.
[21] Marcin Junczys-Dowmunt,et al. Microsoft Translator at WMT 2019: Towards Large-Scale Document-Level Neural Machine Translation , 2019, WMT.
[22] Yiming Yang,et al. MobileBERT: a Compact Task-Agnostic BERT for Resource-Limited Devices , 2020, ACL.
[23] Thomas Wolf,et al. DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter , 2019, ArXiv.
[24] Jörg Tiedemann,et al. OPUS-MT – Building open translation services for the World , 2020, EAMT.
[25] Mirella Lapata,et al. Don’t Give Me the Details, Just the Summary! Topic-Aware Convolutional Neural Networks for Extreme Summarization , 2018, EMNLP.
[26] Marjan Ghazvininejad,et al. Multilingual Denoising Pre-training for Neural Machine Translation , 2020, Transactions of the Association for Computational Linguistics.
[27] Karin M. Verspoor,et al. Findings of the 2016 Conference on Machine Translation , 2016, WMT.
[28] Qun Liu,et al. TinyBERT: Distilling BERT for Natural Language Understanding , 2020, EMNLP.
[29] Geoffrey E. Hinton,et al. Distilling the Knowledge in a Neural Network , 2015, ArXiv.