HMC-TRAN: A Tensor-core Inspired Hierarchical Model Compression for Transformer-based DNNs on GPU
暂无分享,去创建一个
Hang Liu | Caiwen Ding | Hongwu Peng | Zhenglun Kong | Geng Yuan | Lei Yang | Shaoyi Huang | Shusen Wang | Shiyang Chen | Daniel Manu
[1] Christopher Potts,et al. Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank , 2013, EMNLP.
[2] Yifan Gong,et al. Low-rank plus diagonal adaptation for deep neural networks , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[3] Kevin Gimpel,et al. Gaussian Error Linear Units (GELUs) , 2016 .
[4] Hector J. Levesque,et al. The Winograd Schema Challenge , 2011, AAAI Spring Symposium: Logical Formalizations of Commonsense Reasoning.
[5] Joan Bruna,et al. Exploiting Linear Structure Within Convolutional Networks for Efficient Evaluation , 2014, NIPS.
[6] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[7] Minyi Guo,et al. Accelerating Sparse DNN Models without Hardware-Support via Tile-Wise Sparsity , 2020, SC20: International Conference for High Performance Computing, Networking, Storage and Analysis.
[8] Song Han,et al. EIE: Efficient Inference Engine on Compressed Deep Neural Network , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).
[9] Song Han,et al. Learning both Weights and Connections for Efficient Neural Network , 2015, NIPS.
[10] Ji Li,et al. Efficient Transformer-based Large Scale Language Representations using Hardware-friendly Block Structured Pruning , 2020, FINDINGS.
[11] Anna Rumshisky,et al. When BERT Plays the Lottery, All Tickets Are Winning , 2020, EMNLP.
[12] Scott A. Mahlke,et al. Scalpel: Customizing DNN pruning to the underlying hardware parallelism , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).
[13] Lukasz Kaiser,et al. Reformer: The Efficient Transformer , 2020, ICLR.
[14] Colin Raffel,et al. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer , 2019, J. Mach. Learn. Res..
[15] Yiran Chen,et al. Learning Structured Sparsity in Deep Neural Networks , 2016, NIPS.
[16] Chris Brockett,et al. Automatically Constructing a Corpus of Sentential Paraphrases , 2005, IJCNLP.
[17] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[18] Thomas Wolf,et al. HuggingFace's Transformers: State-of-the-art Natural Language Processing , 2019, ArXiv.
[19] Quoc V. Le,et al. Unsupervised Data Augmentation for Consistency Training , 2019, NeurIPS.
[20] Omer Levy,et al. GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding , 2018, BlackboxNLP@EMNLP.
[21] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[22] Lu Lu,et al. An Efficient Deep Reinforcement Learning Framework for UAVs , 2020, 2020 21st International Symposium on Quality Electronic Design (ISQED).
[23] Ming Zhou,et al. HIBERT: Document Level Pre-training of Hierarchical Bidirectional Transformers for Document Summarization , 2019, ACL.
[24] Gregory Frederick Diamos,et al. Block-Sparse Recurrent Neural Networks , 2017, ArXiv.
[25] Diederik P. Kingma,et al. GPU Kernels for Block-Sparse Weights , 2017 .
[26] Richard Socher,et al. Pointer Sentinel Mixture Models , 2016, ICLR.
[27] Jianfeng Gao,et al. UniLMv2: Pseudo-Masked Language Models for Unified Language Model Pre-Training , 2020, ICML.
[28] Arman Cohan,et al. Longformer: The Long-Document Transformer , 2020, ArXiv.
[29] Omer Levy,et al. RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.