暂无分享,去创建一个
Xiao Chen | Xin Jiang | Fang Wang | Lifeng Shang | Qun Liu | Xiaoqi Jiao | Huating Chang | Yichun Yin | Linlin Li | Qun Liu | Lifeng Shang | Linlin Li | Yichun Yin | Fang Wang | Xin Jiang | Xiaoqi Jiao | Xiao Chen | Huating Chang
[1] Quoc V. Le,et al. ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators , 2020, ICLR.
[2] Quoc V. Le,et al. The Evolved Transformer , 2019, ICML.
[3] Qun Liu,et al. TinyBERT: Distilling BERT for Natural Language Understanding , 2020, EMNLP.
[4] Ruifeng Xu,et al. BERT-EMD: Many-to-Many Layer Mapping for BERT Compression with Earth Mover’s Distance , 2020, EMNLP.
[5] Furu Wei,et al. MiniLM: Deep Self-Attention Distillation for Task-Agnostic Compression of Pre-Trained Transformers , 2020, NeurIPS.
[6] Seyed Taghi Akhavan Niaki,et al. Optimizing a hybrid vendor-managed inventory and transportation problem with fuzzy demand: An improved particle swarm optimization algorithm , 2014, Inf. Sci..
[7] Avirup Sil,et al. Structured Pruning of a BERT-based Question Answering Model , 2019 .
[8] Furu Wei,et al. BERT-of-Theseus: Compressing BERT by Progressive Module Replacing , 2020, EMNLP.
[9] Preslav Nakov,et al. Poor Man's BERT: Smaller and Faster Transformer Models , 2020, ArXiv.
[10] Guangming Shi,et al. Network pruning using sparse learning and genetic algorithm , 2020, Neurocomputing.
[11] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[12] Lukasz Kaiser,et al. Universal Transformers , 2018, ICLR.
[13] Omer Levy,et al. GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding , 2018, BlackboxNLP@EMNLP.
[14] X. Yao. Evolving Artificial Neural Networks , 1999 .
[15] Yoshua Bengio,et al. FitNets: Hints for Thin Deep Nets , 2014, ICLR.
[16] Ming-Wei Chang,et al. Well-Read Students Learn Better: On the Importance of Pre-training Compact Models , 2019 .
[17] Ming-Wei Chang,et al. Well-Read Students Learn Better: The Impact of Student Initialization on Knowledge Distillation , 2019, ArXiv.
[18] Qun Liu,et al. Know What You Don't Need: Single-Shot Meta-Pruning for Attention Heads , 2020, AI Open.
[19] Edouard Grave,et al. Depth-Adaptive Transformer , 2020, ICLR.
[20] Michael Georgiopoulos,et al. Coupling weight elimination with genetic algorithms to reduce network size and preserve generalization , 1997, Neurocomputing.
[21] Omer Levy,et al. RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.
[22] Yiming Yang,et al. XLNet: Generalized Autoregressive Pretraining for Language Understanding , 2019, NeurIPS.
[23] Edouard Grave,et al. Reducing Transformer Depth on Demand with Structured Dropout , 2019, ICLR.
[24] Moshe Wasserblat,et al. Q8BERT: Quantized 8Bit BERT , 2019, 2019 Fifth Workshop on Energy Efficient Machine Learning and Cognitive Computing - NeurIPS Edition (EMC2-NIPS).
[25] Michael W. Mahoney,et al. Q-BERT: Hessian Based Ultra Low Precision Quantization of BERT , 2019, AAAI.
[26] Jian Qin,et al. A dynamic chain-like agent genetic algorithm for global numerical optimization and feature selection , 2009, Neurocomputing.
[27] Yonatan Belinkov,et al. Linguistic Knowledge and Transferability of Contextual Representations , 2019, NAACL.
[28] Anna Rumshisky,et al. A Primer in BERTology: What We Know About How BERT Works , 2020, Transactions of the Association for Computational Linguistics.
[29] Peng Zhou,et al. FastBERT: a Self-distilling BERT with Adaptive Inference Time , 2020, ACL.
[30] Kevin Gimpel,et al. ALBERT: A Lite BERT for Self-supervised Learning of Language Representations , 2019, ICLR.
[31] Quoc V. Le,et al. Evolving Normalization-Activation Layers , 2020, NeurIPS.
[32] Colin Raffel,et al. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer , 2019, J. Mach. Learn. Res..
[33] Alan L. Yuille,et al. Genetic CNN , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[34] Jimmy J. Lin,et al. Distilling Task-Specific Knowledge from BERT into Simple Neural Networks , 2019, ArXiv.
[35] Furu Wei,et al. MiniLMv2: Multi-Head Self-Attention Relation Distillation for Compressing Pretrained Transformers , 2021, FINDINGS.
[36] Dorothea Heiss-Czedik,et al. An Introduction to Genetic Algorithms. , 1997, Artificial Life.
[37] Ming Zhou,et al. A Tensorized Transformer for Language Modeling , 2019, NeurIPS.
[38] J. Scott McCarley,et al. Pruning a BERT-based Question Answering Model , 2019, ArXiv.
[39] Yiming Yang,et al. MobileBERT: a Compact Task-Agnostic BERT for Resource-Limited Devices , 2020, ACL.
[40] Naveen Arivazhagan,et al. Small and Practical BERT Models for Sequence Labeling , 2019, EMNLP.
[41] Qun Liu,et al. DynaBERT: Dynamic BERT with Adaptive Width and Depth , 2020, NeurIPS.
[42] Kevin Duh,et al. Compressing BERT: Studying the Effects of Weight Pruning on Transfer Learning , 2020, RepL4NLP@ACL.
[43] Thomas Wolf,et al. DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter , 2019, ArXiv.
[44] Kazuyuki Murase,et al. A new local search based hybrid genetic algorithm for feature selection , 2011, Neurocomputing.
[45] Geoffrey E. Hinton,et al. Distilling the Knowledge in a Neural Network , 2015, ArXiv.
[46] Dan Meng,et al. Extracting linguistic rules from data sets using fuzzy logic and genetic algorithms , 2012, Neurocomputing.
[47] Wentao Ma,et al. A Span-Extraction Dataset for Chinese Machine Reading Comprehension , 2019, EMNLP-IJCNLP.
[48] Yingming Li,et al. Fine-tune BERT with Sparse Self-Attention Mechanism , 2019, EMNLP.
[49] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[50] Yu Cheng,et al. Patient Knowledge Distillation for BERT Model Compression , 2019, EMNLP.