TPrune: Efficient Transformer Pruning for Mobile Devices
暂无分享,去创建一个
Hai Li | Yiran Chen | Huanrui Yang | Ang Li | Jiachen Mao | Yiran Chen | H. Li | Jiachen Mao | Huanrui Yang | Ang Li
[1] George Kurian,et al. Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation , 2016, ArXiv.
[2] Bin Yang,et al. SBNet: Sparse Blocks Network for Fast Inference , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[3] Omer Levy,et al. RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.
[4] Yanzhi Wang,et al. Reweighted Proximal Pruning for Large-Scale Language Representation , 2019, ArXiv.
[5] Yiran Chen,et al. MobiEye: An Efficient Cloud-based Video Detection System for Real-Time Mobile Applications , 2019, 2019 56th ACM/IEEE Design Automation Conference (DAC).
[6] Yiran Chen,et al. ZARA: A Novel Zero-free Dataflow Accelerator for Generative Adversarial Networks in 3D ReRAM , 2019, 2019 56th ACM/IEEE Design Automation Conference (DAC).
[7] Ondrej Bojar,et al. Training Tips for the Transformer Model , 2018, Prague Bull. Math. Linguistics.
[8] Sergey Ioffe,et al. Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning , 2016, AAAI.
[9] Yiran Chen,et al. SPN Dash - Fast Detection of Adversarial Attacks on Mobile via Sensor Pattern Noise Fingerprinting , 2018, 2018 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).
[10] Walter Scheirer,et al. Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation , 2019, EMNLP.
[11] Robin Cheong. transformers . zip : Compressing Transformers with Pruning and Quantization , 2019 .
[12] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[13] Hassan Foroosh,et al. Sparse Convolutional Neural Networks , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[14] Tobias Domhan,et al. How Much Attention Do You Need? A Granular Analysis of Neural Machine Translation Architectures , 2018, ACL.
[15] Wei Wen,et al. DeepHoyer: Learning Sparser Neural Network with Differentiable Scale-Invariant Sparsity Measures , 2019, ICLR.
[16] Edouard Grave,et al. Reducing Transformer Depth on Demand with Structured Dropout , 2019, ICLR.
[17] Moshe Wasserblat,et al. Q8BERT: Quantized 8Bit BERT , 2019, 2019 Fifth Workshop on Energy Efficient Machine Learning and Cognitive Computing - NeurIPS Edition (EMC2-NIPS).
[18] Jiachen Mao,et al. DASNet: Dynamic Activation Sparsity for Neural Network Efficiency Improvement , 2019, 2019 IEEE 31st International Conference on Tools with Artificial Intelligence (ICTAI).
[19] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[20] Michael W. Mahoney,et al. Q-BERT: Hessian Based Ultra Low Precision Quantization of BERT , 2019, AAAI.
[21] Yiming Yang,et al. XLNet: Generalized Autoregressive Pretraining for Language Understanding , 2019, NeurIPS.
[22] Yuan Yu,et al. TensorFlow: A system for large-scale machine learning , 2016, OSDI.
[23] Deyi Xiong,et al. Accelerating Neural Transformer via an Average Attention Network , 2018, ACL.
[24] Song Han,et al. Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding , 2015, ICLR.
[25] Yiran Chen,et al. Running sparse and low-precision neural network: When algorithm meets hardware , 2018, 2018 23rd Asia and South Pacific Design Automation Conference (ASP-DAC).
[26] Yiran Chen,et al. Learning Structured Sparsity in Deep Neural Networks , 2016, NIPS.
[27] Yiming Yang,et al. MobileBERT: a Compact Task-Agnostic BERT for Resource-Limited Devices , 2020, ACL.
[28] Alex Krizhevsky,et al. Learning Multiple Layers of Features from Tiny Images , 2009 .
[29] Xing Wang,et al. Multi-Granularity Self-Attention for Neural Machine Translation , 2019, EMNLP.
[30] Hai Li,et al. NeuralHMC: an efficient HMC-based accelerator for deep neural networks , 2019, ASP-DAC.
[31] Yiran Chen,et al. MoDNN: Local distributed mobile computing system for Deep Neural Network , 2017, Design, Automation & Test in Europe Conference & Exhibition (DATE), 2017.
[32] Yoshua Bengio,et al. Gradient-based learning applied to document recognition , 1998, Proc. IEEE.
[33] Samy Bengio,et al. Tensor2Tensor for Neural Machine Translation , 2018, AMTA.
[34] Yiran Chen,et al. MeDNN: A distributed mobile system with enhanced partition and deployment for large-scale DNNs , 2017, 2017 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).
[35] Xuehai Qian,et al. HyPar: Towards Hybrid Parallelism for Deep Learning Accelerator Array , 2019, 2019 IEEE International Symposium on High Performance Computer Architecture (HPCA).
[36] Fang Liu,et al. Learning Intrinsic Sparse Structures within Long Short-term Memory , 2017, ICLR.
[37] Omer Levy,et al. Are Sixteen Heads Really Better than One? , 2019, NeurIPS.
[38] J. Scott McCarley,et al. Pruning a BERT-based Question Answering Model , 2019, ArXiv.
[39] Song Han,et al. Learning both Weights and Connections for Efficient Neural Network , 2015, NIPS.
[40] Ziheng Wang,et al. Structured Pruning of Large Language Models , 2020, EMNLP.
[41] Yiran Chen,et al. AdaLearner: An adaptive distributed mobile learning system for neural networks , 2017, 2017 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).
[42] Pushmeet Kohli,et al. PerforatedCNNs: Acceleration through Elimination of Redundant Convolutions , 2015, NIPS.
[43] Fedor Moiseev,et al. Analyzing Multi-Head Self-Attention: Specialized Heads Do the Heavy Lifting, the Rest Can Be Pruned , 2019, ACL.
[44] Thomas Wolf,et al. DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter , 2019, ArXiv.
[45] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.