暂无分享,去创建一个
Hai Zhao | Min Zhang | Hongqiu Wu | Hai Zhao | Min Zhang | Hongqi Wu | Hongqiu Wu
[1] Hanan Samet,et al. Pruning Filters for Efficient ConvNets , 2016, ICLR.
[2] Shahrokh Valaee,et al. EDropout: Energy-Based Dropout and Pruning of Deep Neural Networks , 2021, IEEE transactions on neural networks and learning systems.
[3] Wonyong Sung,et al. Structured Pruning of Deep Convolutional Neural Networks , 2015, ACM J. Emerg. Technol. Comput. Syst..
[4] Yann LeCun,et al. Regularization of Neural Networks using DropConnect , 2013, ICML.
[5] Hai Zhao,et al. Code Summarization with Structure-induced Transformer , 2020, FINDINGS.
[6] Anamitra R. Choudhury,et al. PoWER-BERT: Accelerating BERT Inference via Progressive Word-vector Elimination , 2020, ICML.
[7] Timo Aila,et al. Pruning Convolutional Neural Networks for Resource Efficient Inference , 2016, ICLR.
[8] James T. Kwok,et al. SparseBERT: Rethinking the Importance Analysis in Self-attention , 2021, ICML.
[9] Mohit Iyyer,et al. Hard-Coded Gaussian Attention for Neural Machine Translation , 2020, ACL.
[10] Sepp Hochreiter,et al. Self-Normalizing Neural Networks , 2017, NIPS.
[11] Furu Wei,et al. BERT-of-Theseus: Compressing BERT by Progressive Module Replacing , 2020, EMNLP.
[12] David Chiang,et al. Auto-Sizing Neural Networks: With Applications to n-gram Language Models , 2015, EMNLP.
[13] Christopher D. Manning,et al. Compression of Neural Machine Translation Models via Pruning , 2016, CoNLL.
[14] Christopher Potts,et al. Learning Word Vectors for Sentiment Analysis , 2011, ACL.
[15] Beatrice Santorini,et al. Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.
[16] Il-Chul Moon,et al. Adversarial Dropout for Supervised and Semi-supervised Learning , 2017, AAAI.
[17] Yi Tay,et al. Synthesizer: Rethinking Self-Attention for Transformer Models , 2020, ICML.
[18] Edouard Grave,et al. Reducing Transformer Depth on Demand with Structured Dropout , 2019, ICLR.
[19] Yann LeCun,et al. Optimal Brain Damage , 1989, NIPS.
[20] Ashish Khetan,et al. schuBERT: Optimizing Elements of BERT , 2020, ACL.
[21] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[22] Erik F. Tjong Kim Sang,et al. Introduction to the CoNLL-2003 Shared Task: Language-Independent Named Entity Recognition , 2003, CoNLL.
[23] Babak Hassibi,et al. Second Order Derivatives for Network Pruning: Optimal Brain Surgeon , 1992, NIPS.
[24] Andreas Loukas,et al. Attention is Not All You Need: Pure Attention Loses Rank Doubly Exponentially with Depth , 2021, ICML.
[25] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[26] Omer Levy,et al. GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding , 2018, BlackboxNLP@EMNLP.
[27] Yiming Yang,et al. DARTS: Differentiable Architecture Search , 2018, ICLR.
[28] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[29] Nitish Srivastava,et al. Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..
[30] Kevin Gimpel,et al. ALBERT: A Lite BERT for Self-supervised Learning of Language Representations , 2019, ICLR.
[31] Ariel D. Procaccia,et al. Variational Dropout and the Local Reparameterization Trick , 2015, NIPS.
[32] Omer Levy,et al. Improving Transformer Models by Reordering their Sublayers , 2020, ACL.
[33] Quoc V. Le,et al. Neural Architecture Search with Reinforcement Learning , 2016, ICLR.
[34] Omer Levy,et al. RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.