暂无分享,去创建一个
Habib Hajimolahoseini | Ivan Kobyzev | Yang Liu | Tianda Li | Yassir El Mesbahi | Ahmad Rashid | Atif Mahmud | Nithin Anchuri | Mehdi Rezagholizadeh | Yang Liu | Ahmad Rashid | Tianda Li | I. Kobyzev | Mehdi Rezagholizadeh | H. Hajimolahoseini | Nithin Anchuri | A. Mahmud
[1] Yiming Yang,et al. XLNet: Generalized Autoregressive Pretraining for Language Understanding , 2019, NeurIPS.
[2] Zachary Chase Lipton,et al. Born Again Neural Networks , 2018, ICML.
[3] Mehdi Rezagholizadeh,et al. Not Far Away, Not So Close: Sample Efficient Nearest Neighbour Data Augmentation via MiniMax , 2021, FINDINGS.
[4] Mehdi Rezagholizadeh,et al. ALP-KD: Attention-Based Layer Projection for Knowledge Distillation , 2020, AAAI.
[5] Luke S. Zettlemoyer,et al. Deep Contextualized Word Representations , 2018, NAACL.
[6] Omer Levy,et al. RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.
[7] Anna Rumshisky,et al. A Primer in BERTology: What We Know About How BERT Works , 2020, Transactions of the Association for Computational Linguistics.
[8] Mehdi Rezagholizadeh,et al. Towards Zero-Shot Knowledge Distillation for Natural Language Processing , 2020, EMNLP.
[9] Olatunji Ruwase,et al. ZeRO-Infinity: Breaking the GPU Memory Wall for Extreme Scale Deep learning , 2021, SC21: International Conference for High Performance Computing, Networking, Storage and Analysis.
[10] Omer Levy,et al. SuperGLUE: A Stickier Benchmark for General-Purpose Language Understanding Systems , 2019, NeurIPS.
[11] Geoffrey E. Hinton,et al. Distilling the Knowledge in a Neural Network , 2015, ArXiv.
[12] Mehdi Rezagholizadeh,et al. MATE-KD: Masked Adversarial TExt, a Companion to Knowledge Distillation , 2021, ACL.
[13] Richard Socher,et al. Pointer Sentinel Mixture Models , 2016, ICLR.
[14] Olatunji Ruwase,et al. DeepSpeed: System Optimizations Enable Training Deep Learning Models with Over 100 Billion Parameters , 2020, KDD.
[15] Jinwoo Shin,et al. Regularizing Class-Wise Predictions via Self-Knowledge Distillation , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[16] Manish Gupta,et al. Compression of Deep Learning Models for Text: A Survey , 2022, ACM Trans. Knowl. Discov. Data.
[17] Furu Wei,et al. BERT-of-Theseus: Compressing BERT by Progressive Module Replacing , 2020, EMNLP.
[18] Preslav Nakov,et al. Poor Man's BERT: Smaller and Faster Transformer Models , 2020, ArXiv.
[19] Ali Ghodsi,et al. Annealing Knowledge Distillation , 2021, EACL.
[20] Sergey Ioffe,et al. Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[21] Thomas Wolf,et al. DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter , 2019, ArXiv.
[22] Alec Radford,et al. Improving Language Understanding by Generative Pre-Training , 2018 .
[23] Ilya Sutskever,et al. Language Models are Unsupervised Multitask Learners , 2019 .
[24] Pascal Poupart,et al. RAIL-KD: RAndom Intermediate Layer Mapping for Knowledge Distillation , 2021, NAACL-HLT.
[25] Olatunji Ruwase,et al. ZeRO: Memory Optimization Towards Training A Trillion Parameter Models , 2019, SC.
[26] Mark Chen,et al. Language Models are Few-Shot Learners , 2020, NeurIPS.
[27] Yu Cheng,et al. Patient Knowledge Distillation for BERT Model Compression , 2019, EMNLP.
[28] Rich Caruana,et al. Model compression , 2006, KDD '06.
[29] Yanjun Qi,et al. Black-Box Generation of Adversarial Text Sequences to Evade Deep Learning Classifiers , 2018, 2018 IEEE Security and Privacy Workshops (SPW).
[30] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[31] Ali Ghodsi,et al. How to Select One Among All? An Extensive Empirical Study Towards the Robustness of Knowledge Distillation in Natural Language Understanding , 2021, ArXiv.
[32] Omer Levy,et al. GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding , 2018, BlackboxNLP@EMNLP.
[33] Olatunji Ruwase,et al. ZeRO-Offload: Democratizing Billion-Scale Model Training , 2021, USENIX ATC.
[34] Qun Liu,et al. TinyBERT: Distilling BERT for Natural Language Understanding , 2020, EMNLP.