Training with Quantization Noise for Extreme Fixed-Point Compression
暂无分享,去创建一个
Hervé Jégou | Edouard Grave | Armand Joulin | Angela Fan | Benjamin Graham | Pierre Stock | Rémi Gribonval | R. Gribonval | H. Jégou | Edouard Grave | Armand Joulin | Angela Fan | Benjamin Graham | Pierre Stock
[1] Max Welling,et al. Learning Sparse Neural Networks through L0 Regularization , 2017, ICLR.
[2] Guillaume Lample,et al. Augmenting Self-attention with Persistent Memory , 2019, ArXiv.
[3] Balaraman Ravindran,et al. Recovering from Random Pruning: On the Plasticity of Deep Convolutional Neural Networks , 2018, 2018 IEEE Winter Conference on Applications of Computer Vision (WACV).
[4] Luca Antiga,et al. Automatic differentiation in PyTorch , 2017 .
[5] Song Han,et al. Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding , 2015, ICLR.
[6] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[7] Jun Zhu,et al. Stochastic Quantization for Learning Accurate Low-Bit Deep Neural Networks , 2019, International Journal of Computer Vision.
[8] Ming Yang,et al. Compressing Deep Convolutional Networks using Vector Quantization , 2014, ArXiv.
[9] Jianxin Wu,et al. ThiNet: A Filter Level Pruning Method for Deep Neural Network Compression , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[10] Thomas Wolf,et al. DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter , 2019, ArXiv.
[11] Quoc V. Le,et al. Searching for MobileNetV3 , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[12] Mark D. McDonnell,et al. Training wide residual networks for deployment using a single bit for each weight , 2018, ICLR.
[13] Qun Liu,et al. TinyBERT: Distilling BERT for Natural Language Understanding , 2020, EMNLP.
[14] Song Han,et al. HAQ: Hardware-Aware Automated Quantization , 2018, ArXiv.
[15] Omer Levy,et al. GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding , 2018, BlackboxNLP@EMNLP.
[16] Rémi Gribonval,et al. And the Bit Goes Down: Revisiting the Quantization of Neural Networks , 2019, ICLR.
[17] Yu Cheng,et al. Patient Knowledge Distillation for BERT Model Compression , 2019, EMNLP.
[18] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[19] Yann LeCun,et al. Regularization of Neural Networks using DropConnect , 2013, ICML.
[20] Timothy P. Lillicrap,et al. Compressive Transformers for Long-Range Sequence Modelling , 2019, ICLR.
[21] Pritish Narayanan,et al. Deep Learning with Limited Numerical Precision , 2015, ICML.
[22] Niranjan Balasubramanian,et al. Faster and Just As Accurate: A Simple Decomposition for Transformer Models , 2019 .
[23] Moustapha Cissé,et al. Efficient softmax approximation for GPUs , 2016, ICML.
[24] Li Fei-Fei,et al. ImageNet: A large-scale hierarchical image database , 2009, CVPR.
[25] Song Han,et al. Learning both Weights and Connections for Efficient Neural Network , 2015, NIPS.
[26] Hanan Samet,et al. Pruning Filters for Efficient ConvNets , 2016, ICLR.
[27] Edouard Grave,et al. Reducing Transformer Depth on Demand with Structured Dropout , 2019, ICLR.
[28] Frank Hutter,et al. SGDR: Stochastic Gradient Descent with Warm Restarts , 2016, ICLR.
[29] Xiangyu Zhang,et al. ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture Design , 2018, ECCV.
[30] Yann Dauphin,et al. Pay Less Attention with Lightweight and Dynamic Convolutions , 2019, ICLR.
[31] Cordelia Schmid,et al. Product Quantization for Nearest Neighbor Search , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[32] Yann Dauphin,et al. Language Modeling with Gated Convolutional Networks , 2016, ICML.
[33] Yann LeCun,et al. Optimal Brain Damage , 1989, NIPS.
[34] Razvan Pascanu,et al. How to Construct Deep Recurrent Neural Networks , 2013, ICLR.
[35] Yang Song,et al. Extreme Language Model Compression with Optimal Subwords and Shared Projections , 2019, ArXiv.
[36] Xiangyu Zhang,et al. ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[37] Lukasz Kaiser,et al. Universal Transformers , 2018, ICLR.
[38] Edouard Grave,et al. Adaptive Attention Span in Transformers , 2019, ACL.
[39] Yiming Yang,et al. Transformer-XL: Attentive Language Models beyond a Fixed-Length Context , 2019, ACL.
[40] Yoshua Bengio,et al. Estimating or Propagating Gradients Through Stochastic Neurons for Conditional Computation , 2013, ArXiv.
[41] Guillaume Lample,et al. Cross-lingual Language Model Pretraining , 2019, NeurIPS.
[42] Kevin Gimpel,et al. ALBERT: A Lite BERT for Self-supervised Learning of Language Representations , 2019, ICLR.
[43] Ming-Wei Chang,et al. Well-Read Students Learn Better: The Impact of Student Initialization on Knowledge Distillation , 2019, ArXiv.
[44] Mark Sandler,et al. MobileNetV2: Inverted Residuals and Linear Bottlenecks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[45] Matthijs Douze,et al. FastText.zip: Compressing text classification models , 2016, ArXiv.
[46] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[47] Ming Zhou,et al. A Tensorized Transformer for Language Modeling , 2019, NeurIPS.
[48] Ilya Sutskever,et al. Language Models are Unsupervised Multitask Learners , 2019 .
[49] Yoshua Bengio,et al. BinaryConnect: Training Deep Neural Networks with binary weights during propagations , 2015, NIPS.
[50] Swagath Venkataramani,et al. PACT: Parameterized Clipping Activation for Quantized Neural Networks , 2018, ArXiv.
[51] Ali Farhadi,et al. XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks , 2016, ECCV.
[52] Omer Levy,et al. RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.
[53] Kilian Q. Weinberger,et al. CondenseNet: An Efficient DenseNet Using Learned Group Convolutions , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[54] Myle Ott,et al. fairseq: A Fast, Extensible Toolkit for Sequence Modeling , 2019, NAACL.
[55] Geoffrey E. Hinton,et al. On the importance of initialization and momentum in deep learning , 2013, ICML.
[56] Geoffrey E. Hinton,et al. Distilling the Knowledge in a Neural Network , 2015, ArXiv.
[57] Yuhang Li,et al. Additive Powers-of-Two Quantization: A Non-uniform Discretization for Neural Networks , 2019, ICLR 2020.
[58] Samuel R. Bowman,et al. A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference , 2017, NAACL.
[59] Hongbo Deng,et al. AdaBERT: Task-Adaptive BERT Compression with Differentiable Neural Architecture Search , 2020, ArXiv.
[60] Mingjie Sun,et al. Rethinking the Value of Network Pruning , 2018, ICLR.
[61] Raghuraman Krishnamoorthi,et al. Quantizing deep convolutional networks for efficient inference: A whitepaper , 2018, ArXiv.
[62] Deyi Xiong,et al. Accelerating Neural Transformer via an Average Attention Network , 2018, ACL.
[63] Kilian Q. Weinberger,et al. Deep Networks with Stochastic Depth , 2016, ECCV.
[64] Dmitry P. Vetrov,et al. Variational Dropout Sparsifies Deep Neural Networks , 2017, ICML.
[65] Zoubin Ghahramani,et al. Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning , 2015, ICML.
[66] Vincent Vanhoucke,et al. Improving the speed of neural networks on CPUs , 2011 .
[67] Alexei Baevski,et al. Adaptive Input Representations for Neural Language Modeling , 2018, ICLR.
[68] Quoc V. Le,et al. EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks , 2019, ICML.
[69] Miguel Á. Carreira-Perpiñán,et al. Model compression as constrained optimization, with application to neural nets. Part II: quantization , 2017, ArXiv.
[70] Bo Chen,et al. Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[71] Yoshua Bengio,et al. BinaryNet: Training Deep Neural Networks with Weights and Activations Constrained to +1 or -1 , 2016, ArXiv.