On the Importance of Pre-training Data Volume for Compact Language Models
暂无分享,去创建一个
Vincent Micheli | Franccois Fleuret | Martin D'Hoffschmidt | Franccois Fleuret | Vincent Micheli | Martin d'Hoffschmidt
[1] Edouard Grave,et al. Reducing Transformer Depth on Demand with Structured Dropout , 2019, ICLR.
[2] Guillaume Lample,et al. Cross-lingual Language Model Pretraining , 2019, NeurIPS.
[3] Avirup Sil,et al. Structured Pruning of a BERT-based Question Answering Model , 2019 .
[4] Yiming Yang,et al. MobileBERT: a Compact Task-Agnostic BERT for Resource-Limited Devices , 2020, ACL.
[5] Omer Levy,et al. GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding , 2018, BlackboxNLP@EMNLP.
[6] Martin d'Hoffschmidt,et al. FQuAD: French Question Answering Dataset , 2020, FINDINGS.
[7] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[8] Alexander M. Rush,et al. Movement Pruning: Adaptive Sparsity by Fine-Tuning , 2020, NeurIPS.
[9] Iz Beltagy,et al. SciBERT: A Pretrained Language Model for Scientific Text , 2019, EMNLP.
[10] Benoît Sagot,et al. Asynchronous Pipeline for Processing Huge Corpora on Medium to Low Resource Infrastructures , 2019 .
[11] Ming-Wei Chang,et al. Well-Read Students Learn Better: The Impact of Student Initialization on Knowledge Distillation , 2019, ArXiv.
[12] Yin Yang,et al. Compressing Large-Scale Transformer-Based Models: A Case Study on BERT , 2020, ArXiv.
[13] Taku Kudo,et al. SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing , 2018, EMNLP.
[14] Veselin Stoyanov,et al. Unsupervised Cross-lingual Representation Learning at Scale , 2019, ACL.
[15] Edouard Grave,et al. Training with Quantization Noise for Extreme Model Compression , 2020, ICLR.
[16] Ilya Sutskever,et al. Language Models are Unsupervised Multitask Learners , 2019 .
[17] Geoffrey E. Hinton,et al. Distilling the Knowledge in a Neural Network , 2015, ArXiv.
[18] Quoc V. Le,et al. ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators , 2020, ICLR.
[19] Rico Sennrich,et al. Neural Machine Translation of Rare Words with Subword Units , 2015, ACL.
[20] Kurt Keutzer,et al. Q-BERT: Hessian Based Ultra Low Precision Quantization of BERT , 2020, AAAI.
[21] R'emi Louf,et al. HuggingFace's Transformers: State-of-the-art Natural Language Processing , 2019, ArXiv.
[22] Omer Levy,et al. RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.
[23] Laurent Romary,et al. CamemBERT: a Tasty French Language Model , 2019, ACL.
[24] Jian Zhang,et al. SQuAD: 100,000+ Questions for Machine Comprehension of Text , 2016, EMNLP.
[25] Colin Raffel,et al. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer , 2019, J. Mach. Learn. Res..
[26] Qun Liu,et al. TinyBERT: Distilling BERT for Natural Language Understanding , 2020, EMNLP.
[27] Dan Klein,et al. Train Large, Then Compress: Rethinking Model Size for Efficient Training and Inference of Transformers , 2020, ArXiv.
[28] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[29] Ali Farhadi,et al. Fine-Tuning Pretrained Language Models: Weight Initializations, Data Orders, and Early Stopping , 2020, ArXiv.
[30] Jaewoo Kang,et al. BioBERT: a pre-trained biomedical language representation model for biomedical text mining , 2019, Bioinform..