暂无分享,去创建一个
Yin Yang | Preslav Nakov | Hassan Sajjad | Marianne Winslett | Xin Lou | Yao Chen | Mohammad Ali Khan | Deming Chen | Prakhar Ganesh
[1] Ming-Wei Chang,et al. Well-Read Students Learn Better: The Impact of Student Initialization on Knowledge Distillation , 2019, ArXiv.
[2] Ran El-Yaniv,et al. Quantized Neural Networks: Training Neural Networks with Low Precision Weights and Activations , 2016, J. Mach. Learn. Res..
[3] Yang Song,et al. Extreme Language Model Compression with Optimal Subwords and Shared Projections , 2019, ArXiv.
[4] Moshe Wasserblat,et al. Q8BERT: Quantized 8Bit BERT , 2019, 2019 Fifth Workshop on Energy Efficient Machine Learning and Cognitive Computing - NeurIPS Edition (EMC2-NIPS).
[5] Hongbo Deng,et al. AdaBERT: Task-Adaptive BERT Compression with Differentiable Neural Architecture Search , 2020, ArXiv.
[6] Jian Zhang,et al. SQuAD: 100,000+ Questions for Machine Comprehension of Text , 2016, EMNLP.
[7] Qun Liu,et al. TinyBERT: Distilling BERT for Natural Language Understanding , 2020, EMNLP.
[8] Niranjan Balasubramanian,et al. Faster and Just As Accurate: A Simple Decomposition for Transformer Models , 2019 .
[9] Subhabrata Mukherjee,et al. Distilling Transformers into Simple Neural Networks with Unlabeled Transfer Data , 2019, ArXiv.
[10] Kevin Gimpel,et al. ALBERT: A Lite BERT for Self-supervised Learning of Language Representations , 2019, ICLR.
[11] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[12] Martin Andrews,et al. Transformer to CNN: Label-scarce distillation for efficient text classification , 2019, ArXiv.
[13] Yu Cheng,et al. Patient Knowledge Distillation for BERT Model Compression , 2019, EMNLP.
[14] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[15] Edouard Grave,et al. Reducing Transformer Depth on Demand with Structured Dropout , 2019, ICLR.
[16] Yves Scherrer,et al. Fixed Encoder Self-Attention Patterns in Transformer-Based Machine Translation , 2020, EMNLP.
[17] Luke S. Zettlemoyer,et al. Deep Contextualized Word Representations , 2018, NAACL.
[18] Sanja Fidler,et al. Aligning Books and Movies: Towards Story-Like Visual Explanations by Watching Movies and Reading Books , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).
[19] Jimmy J. Lin,et al. Natural Language Generation for Effective Knowledge Distillation , 2019, EMNLP.
[20] Thomas Wolf,et al. DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter , 2019, ArXiv.
[21] Jungang Xu,et al. A Survey on Neural Network Language Models , 2019, ArXiv.
[22] Kevin Duh,et al. Compressing BERT: Studying the Effects of Weight Pruning on Transfer Learning , 2020, RepL4NLP@ACL.
[23] Geoffrey E. Hinton,et al. Distilling the Knowledge in a Neural Network , 2015, ArXiv.
[24] Yanzhi Wang,et al. Reweighted Proximal Pruning for Large-Scale Language Representation , 2019, ArXiv.
[25] Omer Levy,et al. What Does BERT Look at? An Analysis of BERT’s Attention , 2019, BlackboxNLP@ACL.
[26] Kurt Keutzer,et al. Q-BERT: Hessian Based Ultra Low Precision Quantization of BERT , 2020, AAAI.
[27] Tao Zhang,et al. A Survey of Model Compression and Acceleration for Deep Neural Networks , 2017, ArXiv.
[28] Yin Yang,et al. Fine-Grained Propaganda Detection with Fine-Tuned BERT , 2019, EMNLP.
[29] Joyce Y. Chai,et al. Recent Advances in Natural Language Inference: A Survey of Benchmarks, Resources, and Approaches , 2019 .
[30] Sebastian Ruder,et al. Universal Language Model Fine-tuning for Text Classification , 2018, ACL.
[31] Xiyuan Zhang,et al. D-NET: A Pre-Training and Fine-Tuning Framework for Improving the Generalization of Machine Reading Comprehension , 2019, EMNLP.
[32] George Kurian,et al. Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation , 2016, ArXiv.
[33] Yiming Yang,et al. XLNet: Generalized Autoregressive Pretraining for Language Understanding , 2019, NeurIPS.
[34] Omer Levy,et al. Are Sixteen Heads Really Better than One? , 2019, NeurIPS.
[35] P. Cochat,et al. Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.
[36] Omer Levy,et al. GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding , 2018, BlackboxNLP@EMNLP.
[37] Wenan Zhou,et al. A survey of word embeddings based on deep learning , 2019, Computing.
[38] Laurent Besacier,et al. Naver Labs Europe’s Systems for the Document-Level Generation and Translation Task at WNGT 2019 , 2019, EMNLP.
[39] Mohammad Shoeybi,et al. Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism , 2019, ArXiv.