暂无分享,去创建一个
Vahid Partovi Nia | Marzieh S. Tahaei | Marzieh Tahaei | Ahmad Rashid | Mehdi Rezagholizadeh | Ali Edalati | James J. Clark | Ahmad Rashid | V. Nia | A. Edalati | Mehdi Rezagholizadeh | J. Clark
[1] Zhengxiao Du,et al. GPT Understands, Too , 2021, AI Open.
[2] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[3] Risto Lehtonen,et al. Multilevel Statistical Models , 2005 .
[4] Yiming Yang,et al. MobileBERT: a Compact Task-Agnostic BERT for Resource-Limited Devices , 2020, ACL.
[5] Mehdi Rezagholizadeh,et al. Fully Quantized Transformer for Machine Translation , 2020, EMNLP.
[6] Qun Liu,et al. TernaryBERT: Distillation-aware Ultra-low Bit BERT , 2020, EMNLP.
[7] Jia-Nan Wu,et al. Compression of fully-connected layer in neural network by Kronecker product , 2015, 2016 Eighth International Conference on Advanced Computational Intelligence (ICACI).
[8] C. Loan. The ubiquitous Kronecker product , 2000 .
[9] Mehdi Rezagholizadeh,et al. MATE-KD: Masked Adversarial TExt, a Companion to Knowledge Distillation , 2021, ACL.
[10] Richard Socher,et al. Pointer Sentinel Mixture Models , 2016, ICLR.
[11] Dacheng Tao,et al. On Compressing Deep Models by Low Rank and Sparse Decomposition , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[12] Mark Chen,et al. Language Models are Few-Shot Learners , 2020, NeurIPS.
[13] S. R. Searle,et al. On the history of the kronecker product , 1983 .
[14] Song Han,et al. Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding , 2015, ICLR.
[15] Thomas Wolf,et al. DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter , 2019, ArXiv.
[16] Omer Levy,et al. GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding , 2018, BlackboxNLP@EMNLP.
[17] Ali Ghodsi,et al. KroneckerBERT: Learning Kronecker Decomposition for Pre-trained Language Models via Knowledge Distillation , 2021, ArXiv.
[18] Ilya Sutskever,et al. Language Models are Unsupervised Multitask Learners , 2019 .
[19] Ming Yang,et al. Compressing Deep Convolutional Networks using Vector Quantization , 2014, ArXiv.
[20] Qun Liu,et al. TinyBERT: Distilling BERT for Natural Language Understanding , 2020, EMNLP.
[21] Matthew Mattina,et al. Compressing RNNs for IoT devices by 15-38x using Kronecker Products , 2019, ArXiv.
[22] Yiming Yang,et al. XLNet: Generalized Autoregressive Pretraining for Language Understanding , 2019, NeurIPS.
[23] Mehdi Rezagholizadeh,et al. ALP-KD: Attention-Based Layer Projection for Knowledge Distillation , 2020, AAAI.
[24] Geoffrey E. Hinton,et al. Distilling the Knowledge in a Neural Network , 2015, ArXiv.
[25] Mohammad Shoeybi,et al. Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism , 2019, ArXiv.
[26] Christos Faloutsos,et al. Kronecker Graphs: An Approach to Modeling Networks , 2008, J. Mach. Learn. Res..
[27] Furu Wei,et al. MiniLM: Deep Self-Attention Distillation for Task-Agnostic Compression of Pre-Trained Transformers , 2020, NeurIPS.
[28] Ali Ghodsi,et al. Annealing Knowledge Distillation , 2021, EACL.
[29] Omer Levy,et al. RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.
[30] Mehdi Rezagholizadeh,et al. Improving Word Embedding Factorization for Compression using Distilled Nonlinear Neural Decomposition , 2020, EMNLP.