Neural gradients are near-lognormal: improved quantized and sparse training
暂无分享,去创建一个
[1] C. A. R. Hoare,et al. Algorithm 65: find , 1961, Commun. ACM.
[2] Elad Hoffer,et al. Scalable Methods for 8-bit Training of Neural Networks , 2018, NeurIPS.
[3] Albert Gural,et al. Trained Uniform Quantization for Accurate and Efficient Neural Network Inference on Fixed-Point Hardware , 2019, ArXiv.
[4] G. Biau,et al. High-Dimensional \(p\)-Norms , 2013, 1311.0587.
[5] Swagath Venkataramani,et al. PACT: Parameterized Clipping Activation for Quantized Neural Networks , 2018, ArXiv.
[6] Alexander Heinecke,et al. Leveraging the bfloat16 Artificial Intelligence Datatype For Higher-Precision Computations , 2019, 2019 IEEE 26th Symposium on Computer Arithmetic (ARITH).
[7] Anahita Bhiwandiwalla,et al. Shifted and Squeezed 8-bit Floating Point format for Low-Precision Training of Deep Neural Networks , 2020, ICLR.
[8] Michael Carbin,et al. The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks , 2018, ICLR.
[9] Li Fei-Fei,et al. ImageNet: A large-scale hierarchical image database , 2009, CVPR.
[10] Swagath Venkataramani,et al. Hybrid 8-bit Floating Point (HFP8) Training and Inference for Deep Neural Networks , 2019, NeurIPS.
[11] Kamyar Azizzadenesheli,et al. signSGD: compressed optimisation for non-convex problems , 2018, ICML.
[12] Weisheng Zhao,et al. Accelerating CNN Training by Sparsifying Activation Gradients , 2019, ArXiv.
[13] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[14] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.
[15] Daniel Soudry,et al. Post training 4-bit quantization of convolutional networks for rapid-deployment , 2018, NeurIPS.
[16] Pradeep Dubey,et al. A Study of BFLOAT16 for Deep Learning Training , 2019, ArXiv.
[17] Shuang Wu,et al. Training and Inference with Integers in Deep Neural Networks , 2018, ICLR.
[18] Wojciech Samek,et al. Dithered backprop: A sparse and quantized backpropagation algorithm for more efficient deep neural network training , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).
[19] Xu Sun,et al. meProp: Sparsified Back Propagation for Accelerated Deep Learning with Reduced Overfitting , 2017, ICML.
[20] Avi Mendelson,et al. NICE: Noise Injection and Clamping Estimation for Neural Network Quantization , 2018, Mathematics.
[21] N. Smirnov. Table for Estimating the Goodness of Fit of Empirical Distributions , 1948 .
[22] David Thorsley,et al. Post-training Piecewise Linear Quantization for Deep Neural Networks , 2020, ECCV.
[23] Tor M. Aamodt,et al. Sparse Weight Activation Training , 2020, NeurIPS.
[24] Daniel Brand,et al. Training Deep Neural Networks with 8-bit Floating Point Numbers , 2018, NeurIPS.
[25] Swagath Venkataramani,et al. Bridging the Accuracy Gap for 2-bit Quantized Neural Networks (QNN) , 2018, ArXiv.
[26] Kilian Q. Weinberger,et al. Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[27] Bernard Widrow,et al. Quantization Noise: Roundoff Error in Digital Computation, Signal Processing, Control, and Communications , 2008 .