暂无分享,去创建一个
[1] Kevin Duh,et al. Compressing BERT: Studying the Effects of Weight Pruning on Transfer Learning , 2020, RepL4NLP@ACL.
[2] Jingren Zhou,et al. AdaBERT: Task-Adaptive BERT Compression with Differentiable Neural Architecture Search , 2020, IJCAI.
[3] Yiming Yang,et al. MobileBERT: a Compact Task-Agnostic BERT for Resource-Limited Devices , 2020, ACL.
[4] Thomas Wolf,et al. DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter , 2019, ArXiv.
[5] Frank Hutter,et al. Neural Architecture Search: A Survey , 2018, J. Mach. Learn. Res..
[6] Yann Dauphin,et al. Language Modeling with Gated Convolutional Networks , 2016, ICML.
[7] Kevin Gimpel,et al. ALBERT: A Lite BERT for Self-supervised Learning of Language Representations , 2019, ICLR.
[8] Jimmy J. Lin,et al. Distilling Task-Specific Knowledge from BERT into Simple Neural Networks , 2019, ArXiv.
[9] Quoc V. Le,et al. Searching for Activation Functions , 2018, arXiv.
[10] Quoc V. Le,et al. The Evolved Transformer , 2019, ICML.
[11] Rui Xu,et al. When NAS Meets Robustness: In Search of Robust Architectures Against Adversarial Attacks , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[12] Peter Szolovits,et al. Is BERT Really Robust? A Strong Baseline for Natural Language Attack on Text Classification and Entailment , 2020, AAAI.
[13] Danilo Vasconcellos Vargas,et al. Evolving Robust Neural Architectures to Defend from Adversarial Attacks , 2019, AISafety@IJCAI.
[14] Yoshua Bengio,et al. Deep Sparse Rectifier Neural Networks , 2011, AISTATS.
[15] Yiming Yang,et al. XLNet: Generalized Autoregressive Pretraining for Language Understanding , 2019, NeurIPS.
[16] Andy B. Yoo,et al. Approved for Public Release; Further Dissemination Unlimited X-ray Pulse Compression Using Strained Crystals X-ray Pulse Compression Using Strained Crystals , 2002 .
[17] Aleksander Madry,et al. Towards Deep Learning Models Resistant to Adversarial Attacks , 2017, ICLR.
[18] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[19] Subhabrata Mukherjee,et al. XtremeDistil: Multi-stage Distillation for Massive Multilingual Models , 2020, ACL.
[20] Kalyanmoy Deb,et al. A Comparative Analysis of Selection Schemes Used in Genetic Algorithms , 1990, FOGA.
[21] Micah Goldblum,et al. Adversarially Robust Distillation , 2019, AAAI.
[22] Omer Levy,et al. RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.
[23] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[24] Yu Cheng,et al. Patient Knowledge Distillation for BERT Model Compression , 2019, EMNLP.
[25] Omer Levy,et al. GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding , 2018, BlackboxNLP@EMNLP.
[26] Quoc V. Le,et al. Large-Scale Evolution of Image Classifiers , 2017, ICML.
[27] Qun Liu,et al. TinyBERT: Distilling BERT for Natural Language Understanding , 2020, EMNLP.
[28] Xiang Zhang,et al. Character-level Convolutional Networks for Text Classification , 2015, NIPS.
[29] Bo Chen,et al. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications , 2017, ArXiv.
[30] Natalia Gimelshein,et al. PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.
[31] Christopher Potts,et al. Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank , 2013, EMNLP.
[32] Naveen Arivazhagan,et al. Small and Practical BERT Models for Sequence Labeling , 2019, EMNLP.
[33] Jonathon Shlens,et al. Explaining and Harnessing Adversarial Examples , 2014, ICLR.
[34] Patrick D. McDaniel,et al. Transferability in Machine Learning: from Phenomena to Black-Box Attacks using Adversarial Samples , 2016, ArXiv.
[35] Joan Bruna,et al. Intriguing properties of neural networks , 2013, ICLR.
[36] Edouard Grave,et al. Reducing Transformer Depth on Demand with Structured Dropout , 2019, ICLR.
[37] Michael W. Mahoney,et al. Q-BERT: Hessian Based Ultra Low Precision Quantization of BERT , 2019, AAAI.
[38] Geoffrey E. Hinton,et al. Distilling the Knowledge in a Neural Network , 2015, ArXiv.