暂无分享,去创建一个
Tijmen Blankevoort | Markus Nagel | Yelysei Bondarenko | Markus Nagel | Tijmen Blankevoort | Yelysei Bondarenko
[1] Edouard Grave,et al. Training with Quantization Noise for Extreme Model Compression , 2020, ICLR.
[2] Yuandong Tian,et al. Mixed Precision Quantization of ConvNets via Differentiable Neural Architecture Search , 2018, ArXiv.
[3] Steven K. Esser,et al. Learned Step Size Quantization , 2019, ICLR.
[4] Quoc V. Le,et al. ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators , 2020, ICLR.
[5] Xianglong Liu,et al. Towards Unified INT8 Training for Convolutional Neural Network , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[6] Sachin S. Talathi,et al. Fixed Point Quantization of Deep Convolutional Networks , 2015, ICML.
[7] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[8] Mingkui Tan,et al. NAT: Neural Architecture Transformer for Accurate and Compact Architectures , 2019, NeurIPS.
[9] Daniel Soudry,et al. Post training 4-bit quantization of convolutional networks for rapid-deployment , 2018, NeurIPS.
[10] Ali Farhadi,et al. Fine-Tuning Pretrained Language Models: Weight Initializations, Data Orders, and Early Stopping , 2020, ArXiv.
[11] Kurt Keutzer,et al. HAWQ: Hessian AWare Quantization of Neural Networks With Mixed-Precision , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[12] Markus Nagel,et al. Data-Free Quantization Through Weight Equalization and Bias Correction , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[13] Thomas Wolf,et al. DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter , 2019, ArXiv.
[14] Alec Radford,et al. Improving Language Understanding by Generative Pre-Training , 2018 .
[15] Omer Levy,et al. RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.
[16] Nikolaos Pappas,et al. Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention , 2020, ICML.
[17] Yi Tay,et al. Efficient Transformers: A Survey , 2020, ArXiv.
[18] Yiming Yang,et al. Transformer-XL: Attentive Language Models beyond a Fixed-Length Context , 2019, ACL.
[19] Mona Attariyan,et al. Parameter-Efficient Transfer Learning for NLP , 2019, ICML.
[20] Ying Wang,et al. Bayesian Bits: Unifying Quantization and Pruning , 2020, NeurIPS.
[21] Jesse Vig,et al. A Multiscale Visualization of Attention in the Transformer Model , 2019, ACL.
[22] Shuchang Zhou,et al. DoReFa-Net: Training Low Bitwidth Convolutional Neural Networks with Low Bitwidth Gradients , 2016, ArXiv.
[23] Mark Chen,et al. Generative Pretraining From Pixels , 2020, ICML.
[24] Kushal Datta,et al. Efficient 8-Bit Quantization of Transformer Neural Machine Language Translation Model , 2019, ArXiv.
[25] Raghuraman Krishnamoorthi,et al. Quantizing deep convolutional networks for efficient inference: A whitepaper , 2018, ArXiv.
[26] Ran El-Yaniv,et al. Quantized Neural Networks: Training Neural Networks with Low Precision Weights and Activations , 2016, J. Mach. Learn. Res..
[27] Omer Levy,et al. GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding , 2018, BlackboxNLP@EMNLP.
[28] Alexander M. Rush,et al. Movement Pruning: Adaptive Sparsity by Fine-Tuning , 2020, NeurIPS.
[29] Hongbin Zha,et al. Alternating Multi-bit Quantization for Recurrent Neural Networks , 2018, ICLR.
[30] Georg Heigold,et al. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale , 2021, ICLR.
[31] Seyed-Mohsen Moosavi-Dezfooli,et al. Adaptive Quantization for Deep Neural Network , 2017, AAAI.
[32] Kevin Gimpel,et al. ALBERT: A Lite BERT for Self-supervised Learning of Language Representations , 2019, ICLR.
[33] Yiming Yang,et al. MobileBERT: a Compact Task-Agnostic BERT for Resource-Limited Devices , 2020, ACL.
[34] Arman Cohan,et al. Longformer: The Long-Document Transformer , 2020, ArXiv.
[35] Yoshua Bengio,et al. Estimating or Propagating Gradients Through Stochastic Neurons for Conditional Computation , 2013, ArXiv.
[36] Shuang Wu,et al. Training and Inference with Integers in Deep Neural Networks , 2018, ICLR.
[37] Lukasz Kaiser,et al. Universal Transformers , 2018, ICLR.
[38] Rana Ali Amjad,et al. A White Paper on Neural Network Quantization , 2021, ArXiv.
[39] Pritish Narayanan,et al. Deep Learning with Limited Numerical Precision , 2015, ICML.
[40] Marcin Junczys-Dowmunt,et al. Marian: Cost-effective High-Quality Neural Machine Translation in C++ , 2018, NMT@ACL.
[41] Iain Murray,et al. BERT and PALs: Projected Attention Layers for Efficient Adaptation in Multi-Task Learning , 2019, ICML.
[42] Yiming Yang,et al. XLNet: Generalized Autoregressive Pretraining for Language Understanding , 2019, NeurIPS.
[43] Bo Chen,et al. Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[44] Kurt Keutzer,et al. I-BERT: Integer-only BERT Quantization , 2021, ICML.
[45] Omer Levy,et al. What Does BERT Look at? An Analysis of BERT’s Attention , 2019, BlackboxNLP@ACL.
[46] Lysandre Debut,et al. HuggingFace's Transformers: State-of-the-art Natural Language Processing , 2019, ArXiv.
[47] Lei Deng,et al. HitNet: Hybrid Ternary Recurrent Neural Network , 2018, NeurIPS.
[48] Qun Liu,et al. TinyBERT: Distilling BERT for Natural Language Understanding , 2020, EMNLP.
[49] Anna Rumshisky,et al. When BERT Plays the Lottery, All Tickets Are Winning , 2020, EMNLP.
[50] Mark Chen,et al. Language Models are Few-Shot Learners , 2020, NeurIPS.
[51] Ilya Sutskever,et al. Generating Long Sequences with Sparse Transformers , 2019, ArXiv.
[52] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[53] Forrest N. Iandola,et al. SqueezeBERT: What can computer vision teach NLP about efficient neural networks? , 2020, SUSTAINLP.
[54] Lukasz Kaiser,et al. Rethinking Attention with Performers , 2020, ArXiv.
[55] Song Han,et al. HAT: Hardware-Aware Transformers for Efficient Natural Language Processing , 2020, ACL.
[56] Nicolas Usunier,et al. End-to-End Object Detection with Transformers , 2020, ECCV.
[57] Rana Ali Amjad,et al. Up or Down? Adaptive Rounding for Post-Training Quantization , 2020, ICML.
[58] Zhijian Liu,et al. HAQ: Hardware-Aware Automated Quantization With Mixed Precision , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[59] Hector J. Levesque,et al. The Winograd Schema Challenge , 2011, AAAI Spring Symposium: Logical Formalizations of Commonsense Reasoning.
[60] Yoni Choukroun,et al. Low-bit Quantization of Neural Networks for Efficient Inference , 2019, 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW).
[61] Ilya Sutskever,et al. Language Models are Unsupervised Multitask Learners , 2019 .
[62] Alessandro Forin,et al. Pushing the Limits of Narrow Precision Inferencing at Cloud Scale with Microsoft Floating Point , 2020, NeurIPS.
[63] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[64] G. Hua,et al. LQ-Nets: Learned Quantization for Highly Accurate and Compact Deep Neural Networks , 2018, ECCV.
[65] Jinwon Lee,et al. LSQ+: Improving low-bit quantization through learnable offsets and better initialization , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).
[66] Lukasz Kaiser,et al. Reformer: The Efficient Transformer , 2020, ICLR.
[67] Andrew Zisserman,et al. Video Action Transformer Network , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[68] Moshe Wasserblat,et al. Q8BERT: Quantized 8Bit BERT , 2019, 2019 Fifth Workshop on Energy Efficient Machine Learning and Cognitive Computing - NeurIPS Edition (EMC2-NIPS).
[69] Michael W. Mahoney,et al. Q-BERT: Hessian Based Ultra Low Precision Quantization of BERT , 2019, AAAI.