Knowledge Distillation of Russian Language Models with Reduction of Vocabulary
暂无分享,去创建一个
[1] Reza Yazdani Aminabadi,et al. Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model , 2022, ArXiv.
[2] Daniel Matthew Cer,et al. Language-agnostic BERT Sentence Embedding , 2020, ACL.
[3] Po-Sen Huang,et al. Scaling Language Models: Methods, Analysis & Insights from Training Gopher , 2021, ArXiv.
[4] Xianyan Jia,et al. M6: A Chinese Multimodal Pretrainer , 2021, ArXiv.
[5] Li Dong,et al. MiniLMv2: Multi-Head Self-Attention Relation Distillation for Compressing Pretrained Transformers , 2020, FINDINGS.
[6] Yang Song,et al. Extremely Small BERT Models from Mixed-Vocabulary Training , 2021, EACL.
[7] Alena Fenogenova,et al. RussianSuperGLUE: A Russian Language Understanding Evaluation Benchmark , 2020, EMNLP.
[8] Yu Cheng,et al. Contrastive Distillation on Intermediate Representations for Language Model Compression , 2020, EMNLP.
[9] Mark Chen,et al. Language Models are Few-Shot Learners , 2020, NeurIPS.
[10] Yiming Yang,et al. MobileBERT: a Compact Task-Agnostic BERT for Resource-Limited Devices , 2020, ACL.
[11] Li Dong,et al. MiniLM: Deep Self-Attention Distillation for Task-Agnostic Compression of Pre-Trained Transformers , 2020, NeurIPS.
[12] Mitchell A. Gordon,et al. Compressing BERT: Studying the Effects of Weight Pruning on Transfer Learning , 2020, REPL4NLP.
[13] Geoffrey E. Hinton,et al. A Simple Framework for Contrastive Learning of Visual Representations , 2020, ICML.
[14] Leonid Boytsov,et al. SberQuAD - Russian Reading Comprehension Dataset: Description and Analysis , 2019, CLEF.
[15] Yonglong Tian,et al. Contrastive Representation Distillation , 2019, ICLR.
[16] Gustavo Aguilar,et al. Knowledge Distillation from Internal Representations , 2019, AAAI Conference on Artificial Intelligence.
[17] Kevin Gimpel,et al. ALBERT: A Lite BERT for Self-supervised Learning of Language Representations , 2019, ICLR.
[18] Xin Jiang,et al. TinyBERT: Distilling BERT for Natural Language Understanding , 2019, FINDINGS.
[19] Thomas Wolf,et al. DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter , 2019, ArXiv.
[20] Martin Andrews,et al. Transformer to CNN: Label-scarce distillation for efficient text classification , 2019, ArXiv.
[21] Yu Cheng,et al. Patient Knowledge Distillation for BERT Model Compression , 2019, EMNLP.
[22] Omer Levy,et al. RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.
[23] Fedor Moiseev,et al. Analyzing Multi-Head Self-Attention: Specialized Heads Do the Heavy Lifting, the Rest Can Be Pruned , 2019, ACL.
[24] Omer Levy,et al. SuperGLUE: A Stickier Benchmark for General-Purpose Language Understanding Systems , 2019, NeurIPS.
[25] Jimmy J. Lin,et al. Distilling Task-Specific Knowledge from BERT into Simple Neural Networks , 2019, ArXiv.
[26] Valentin Khrulkov,et al. Tensorized Embedding Layers for Efficient Model Compression , 2019, ArXiv.
[27] Douwe Kiela,et al. No Training Required: Exploring Random Encoders for Sentence Classification , 2019, ICLR.
[28] Frank Hutter,et al. Decoupled Weight Decay Regularization , 2017, ICLR.
[29] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[30] Anna Rumshisky,et al. RuSentiment: An Enriched Sentiment Analysis Dataset for Social Media in Russian , 2018, COLING.
[31] Varvara Logacheva,et al. DeepPavlov: Open-Source Library for Dialogue Systems , 2018, ACL.
[32] Elena Yagunova,et al. ParaPhraser: Russian paraphrase corpus and shared task , 2017 .
[33] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[34] Natalia Loukachevitch,et al. Two-stage approach in Russian named entity recognition , 2016, 2016 International FRUCT Conference on Intelligence, Social Media and Web (ISMW FRUCT).
[35] Jörg Tiedemann,et al. OpenSubtitles2016: Extracting Large Parallel Corpora from Movie and TV Subtitles , 2016, LREC.
[36] Geoffrey E. Hinton,et al. Distilling the Knowledge in a Neural Network , 2015, ArXiv.
[37] Jian Sun,et al. Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).
[38] Ming Yang,et al. Compressing Deep Convolutional Networks using Vector Quantization , 2014, ArXiv.
[39] Rich Caruana,et al. Do Deep Nets Really Need to be Deep? , 2013, NIPS.
[40] Rich Caruana,et al. Model compression , 2006, KDD '06.
[41] Yann LeCun,et al. Signature Verification Using A "Siamese" Time Delay Neural Network , 1993, Int. J. Pattern Recognit. Artif. Intell..
[42] Yann LeCun,et al. Optimal Brain Damage , 1989, NIPS.