暂无分享,去创建一个
Hyung Won Chung | Jason Riesa | Chun-Sung Ferng | Jayden Ooi | Henry Tsai | Chun-Sung Ferng | Jason Riesa | Jayden Ooi | Henry Tsai
[1] Geoffrey E. Hinton,et al. Distilling the Knowledge in a Neural Network , 2015, ArXiv.
[2] Kurt Keutzer,et al. Rethinking Batch Normalization in Transformers , 2020, ICML 2020.
[3] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[4] Moshe Wasserblat,et al. Q8BERT: Quantized 8Bit BERT , 2019, 2019 Fifth Workshop on Energy Efficient Machine Learning and Cognitive Computing - NeurIPS Edition (EMC2-NIPS).
[5] Taku Kudo,et al. SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing , 2018, EMNLP.
[6] Bo Chen,et al. Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[7] Quoc V. Le,et al. The Evolved Transformer , 2019, ICML.
[8] Edouard Grave,et al. Training with Quantization Noise for Extreme Model Compression , 2020, ICLR.
[9] Naveen Arivazhagan,et al. Small and Practical BERT Models for Sequence Labeling , 2019, EMNLP.
[10] Suyog Gupta,et al. To prune, or not to prune: exploring the efficacy of pruning for model compression , 2017, ICLR.
[11] Liang Lin,et al. SNAS: Stochastic Neural Architecture Search , 2018, ICLR.
[12] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[13] Elad Eban,et al. MorphNet: Fast & Simple Resource-Constrained Structure Learning of Deep Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[14] Omer Levy,et al. GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding , 2018, BlackboxNLP@EMNLP.
[15] Frank Hutter,et al. Neural Architecture Search: A Survey , 2018, J. Mach. Learn. Res..
[16] Kevin Gimpel,et al. ALBERT: A Lite BERT for Self-supervised Learning of Language Representations , 2019, ICLR.
[17] Graham Neubig,et al. XTREME: A Massively Multilingual Multi-task Benchmark for Evaluating Cross-lingual Generalization , 2020, ICML.
[18] Quoc V. Le,et al. BAM! Born-Again Multi-Task Networks for Natural Language Understanding , 2019, ACL.
[19] Geoffrey E. Hinton,et al. Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer , 2017, ICLR.
[20] Omer Levy,et al. RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.
[21] James Demmel,et al. Large Batch Optimization for Deep Learning: Training BERT in 76 minutes , 2019, ICLR.