BiBERT: Accurate Fully Binarized BERT
暂无分享,去创建一个
Qingqing Dang | Xianglong Liu | Ziwei Liu | Mingyuan Zhang | Yifu Ding | Haotong Qin | Aishan Liu | Qing Yan
[1] Ling Shao,et al. ReCU: Reviving the Dead Weights in Binary Neural Networks , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).
[2] Jie Zhou,et al. Learning Efficient Binarized Object Detectors With Information Compression , 2021, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[3] Michael R. Lyu,et al. BinaryBERT: Pushing the Limit of BERT Quantization , 2020, ACL.
[4] Qun Liu,et al. TernaryBERT: Distillation-aware Ultra-low Bit BERT , 2020, EMNLP.
[5] Andreas Moshovos,et al. GOBO: Quantizing Attention-Based NLP Models for Low Latency and Energy Efficient Inference , 2020, 2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[6] Xin Jiang,et al. DynaBERT: Dynamic BERT with Adaptive Width and Depth , 2020, NeurIPS.
[7] Georgios Tzimiropoulos,et al. Training Binary Neural Networks with Real-to-Binary Convolutions , 2020, ICLR.
[8] Li Dong,et al. MiniLM: Deep Self-Attention Distillation for Task-Agnostic Compression of Pre-Trained Transformers , 2020, NeurIPS.
[9] Xipeng Qiu,et al. Improving BERT Fine-Tuning via Self-Ensemble and Self-Distillation , 2020, Journal of Computer Science and Technology.
[10] Mitchell A. Gordon,et al. Compressing BERT: Studying the Effects of Weight Pruning on Transfer Learning , 2020, REPL4NLP.
[11] J. Scott McCarley,et al. Structured Pruning of a BERT-based Question Answering Model , 2019 .
[12] Moshe Wasserblat,et al. Q8BERT: Quantized 8Bit BERT , 2019, 2019 Fifth Workshop on Energy Efficient Machine Learning and Cognitive Computing - NeurIPS Edition (EMC2-NIPS).
[13] Kevin Gimpel,et al. ALBERT: A Lite BERT for Self-supervised Learning of Language Representations , 2019, ICLR.
[14] Jingkuan Song,et al. Forward and Backward Information Retention for Accurate Binary Neural Networks , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[15] Xin Jiang,et al. TinyBERT: Distilling BERT for Natural Language Understanding , 2019, FINDINGS.
[16] Michael W. Mahoney,et al. Q-BERT: Hessian Based Ultra Low Precision Quantization of BERT , 2019, AAAI.
[17] Yangming Li,et al. A Stack-Propagation Framework with Token-Level Intent Detection for Spoken Language Understanding , 2019, EMNLP.
[18] Greg Mori,et al. Similarity-Preserving Knowledge Distillation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[19] Jesse Vig,et al. A Multiscale Visualization of Attention in the Transformer Model , 2019, ACL.
[20] Deliang Fan,et al. Simultaneously Optimizing Weight and Quantizer of Ternary Neural Network Using Truncated Gaussian Approximation , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[21] Daniel Soudry,et al. Post training 4-bit quantization of convolutional networks for rapid-deployment , 2018, NeurIPS.
[22] Wei Liu,et al. Bi-Real Net: Enhancing the Performance of 1-bit CNNs With Improved Representational Capability and Advanced Training Algorithm , 2018, ECCV.
[23] Yang Liu,et al. Two-Step Quantization for Low-bit Neural Networks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[24] Samuel R. Bowman,et al. GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding , 2018, BlackboxNLP@EMNLP.
[25] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[26] Quoc V. Le,et al. Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation , 2016, ArXiv.
[27] Shuchang Zhou,et al. DoReFa-Net: Training Low Bitwidth Convolutional Neural Networks with Low Bitwidth Gradients , 2016, ArXiv.
[28] Jian Zhang,et al. SQuAD: 100,000+ Questions for Machine Comprehension of Text , 2016, EMNLP.
[29] Ali Farhadi,et al. XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks , 2016, ECCV.
[30] Yoshua Bengio,et al. Attention-Based Models for Speech Recognition , 2015, NIPS.
[31] Yoshua Bengio,et al. Show, Attend and Tell: Neural Image Caption Generation with Visual Attention , 2015, ICML.
[32] Yoshua Bengio,et al. Estimating or Propagating Gradients Through Stochastic Neurons for Conditional Computation , 2013, ArXiv.
[33] David G. Messerschmitt,et al. Quantizing for maximum output entropy (Corresp.) , 1971, IEEE Trans. Inf. Theory.
[34] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[35] H. M. Walker. DE MOIVRE ON THE LAW OF NORMAL PROBABILITY , 2006 .