暂无分享,去创建一个
Kurt Keutzer | Zhen Dong | Michael W. Mahoney | Amir Gholami | Zhewei Yao | Sehoon Kim | K. Keutzer | Zhen Dong | A. Gholami | Z. Yao | Sehoon Kim
[1] Daniel Soudry,et al. Post training 4-bit quantization of convolutional networks for rapid-deployment , 2018, NeurIPS.
[2] Forrest N. Iandola,et al. SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <1MB model size , 2016, ArXiv.
[3] Nojun Kwak,et al. Position-based Scaled Gradient for Model Quantization and Sparse Training , 2020, ArXiv.
[4] Haichen Shen,et al. TVM: An Automated End-to-End Optimizing Compiler for Deep Learning , 2018, OSDI.
[5] B. Shastri,et al. Dynamic Precision Analog Computing for Neural Networks , 2021, IEEE Journal of Selected Topics in Quantum Electronics.
[6] Yan Wang,et al. Rotated Binary Neural Network , 2020, NeurIPS.
[7] S. Stigler,et al. The History of Statistics: The Measurement of Uncertainty before 1900 by Stephen M. Stigler (review) , 1986, Technology and Culture.
[8] Quoc V. Le,et al. Searching for MobileNetV3 , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[9] Hai Victor Habi,et al. HMQ: Hardware Friendly Mixed Precision Quantization Block for CNNs , 2020, ECCV.
[10] Edouard Grave,et al. Training with Quantization Noise for Extreme Model Compression , 2020, ICLR.
[11] Eunhyeok Park,et al. Value-aware Quantization for Training and Inference of Neural Networks , 2018, ECCV.
[12] Ian D. Reid,et al. Towards Effective Low-Bitwidth Convolutional Neural Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[13] Qi Tian,et al. Data-Free Learning of Student Networks , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[14] Georgios Tzimiropoulos,et al. Training Binary Neural Networks with Real-to-Binary Convolutions , 2020, ICLR.
[15] C. John Glossner,et al. Pruning and Quantization for Deep Neural Network Acceleration: A Survey , 2021, Neurocomputing.
[16] Jing Jin,et al. KDLSQ-BERT: A Quantized Bert Combining Knowledge Distillation with Learned Step Size Quantization , 2021, ArXiv.
[17] Wei Pan,et al. Towards Accurate Binary Convolutional Neural Network , 2017, NIPS.
[18] Yinghai Lu,et al. Deep Learning Recommendation Model for Personalization and Recommendation Systems , 2019, ArXiv.
[19] Kevin Gimpel,et al. Gaussian Error Linear Units (GELUs) , 2016 .
[20] Vikas Singh,et al. A Biresolution Spectral Framework for Product Quantization , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[21] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.
[22] Bo Chen,et al. NetAdapt: Platform-Aware Neural Network Adaptation for Mobile Applications , 2018, ECCV.
[23] Jian Cheng,et al. Generative Zero-shot Network Quantization , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).
[24] Rémi Gribonval,et al. And the Bit Goes Down: Revisiting the Quantization of Neural Networks , 2019, ICLR.
[25] Philip H. S. Torr,et al. SNIP: Single-shot Network Pruning based on Connection Sensitivity , 2018, ICLR.
[26] C. Shannon. Coding Theorems for a Discrete Source With a Fidelity Criterion-Claude , 2009 .
[27] Swagath Venkataramani,et al. BiScaled-DNN: Quantizing Long-tailed Datastructures with Two Scale Factors for Deep Neural Networks , 2019, 2019 56th ACM/IEEE Design Automation Conference (DAC).
[28] Asit K. Mishra,et al. Apprentice: Using Knowledge Distillation Techniques To Improve Low-Precision Network Accuracy , 2017, ICLR.
[29] Jiwen Lu,et al. Learning Channel-Wise Interactions for Binary Convolutional Neural Networks , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[30] Bingbing Ni,et al. Variational Convolutional Neural Network Pruning , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[31] Dan Alistarh,et al. Sparsity in Deep Learning: Pruning and growth for efficient inference and training in neural networks , 2021, J. Mach. Learn. Res..
[32] Mark Horowitz,et al. 1.1 Computing's energy problem (and what we can do about it) , 2014, 2014 IEEE International Solid-State Circuits Conference Digest of Technical Papers (ISSCC).
[33] Kurt Keutzer,et al. Hessian-Aware Pruning and Optimal Neural Implant , 2021, 2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV).
[34] Kurt Keutzer,et al. Shift: A Zero FLOP, Zero Parameter Alternative to Spatial Convolutions , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[35] Song Han,et al. AMC: AutoML for Model Compression and Acceleration on Mobile Devices , 2018, ECCV.
[36] Song Han,et al. Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding , 2015, ICLR.
[37] Frank Hutter,et al. Neural Architecture Search: A Survey , 2018, J. Mach. Learn. Res..
[38] Ali Farhadi,et al. XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks , 2016, ECCV.
[39] Sanguthevar Rajasekaran,et al. AutoPrune: Automatic Network Pruning by Regularizing Auxiliary Parameters , 2019, NeurIPS.
[40] Maxim Naumov,et al. On Periodic Functions as Regularizers for Quantization of Neural Networks , 2018, ArXiv.
[41] Song Han,et al. HAQ: Hardware-Aware Automated Quantization , 2018, ArXiv.
[42] Yoshua Bengio,et al. Training deep neural networks with low precision multiplications , 2014 .
[43] Luca Benini,et al. Leveraging Automated Mixed-Low-Precision Quantization for Tiny Edge Microcontrollers , 2020, IoT Streams/ITEM@PKDD/ECML.
[44] Kaisheng Ma,et al. Be Your Own Teacher: Improve the Performance of Convolutional Neural Networks via Self Distillation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[45] Diana Marculescu,et al. One Weight Bitwidth to Rule Them All , 2020, ECCV Workshops.
[46] Philipp Birken,et al. Numerical Linear Algebra , 2011, Encyclopedia of Parallel Computing.
[47] Ian D. Reid,et al. Structured Binary Neural Networks for Accurate Image Classification and Semantic Segmentation , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[48] Mark Sandler,et al. MobileNetV2: Inverted Residuals and Linear Bottlenecks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[49] Lav R. Varshney,et al. Optimal Information Storage in Noisy Synapses under Resource Constraints , 2006, Neuron.
[50] Junmo Kim,et al. A Gift from Knowledge Distillation: Fast Optimization, Network Minimization and Transfer Learning , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[51] Sinno Jialin Pan,et al. MetaQuant: Learning to Quantize by Learning to Penetrate Non-differentiable Quantization , 2019, NeurIPS.
[52] Rongrong Ji,et al. Accelerating Convolutional Networks via Global & Dynamic Filter Pruning , 2018, IJCAI.
[53] Kilian Q. Weinberger,et al. Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[54] Quoc V. Le,et al. EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks , 2019, ICML.
[55] Neil D. Lawrence,et al. Variational Information Distillation for Knowledge Transfer , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[56] Nilanjan Ray,et al. Layer Importance Estimation with Imprinting for Neural Network Quantization , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).
[57] Jian Cheng,et al. Quantized Convolutional Neural Networks for Mobile Devices , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[58] Bo Xu,et al. Distilled Binary Neural Network for Monaural Speech Separation , 2018, 2018 International Joint Conference on Neural Networks (IJCNN).
[59] B. Kailkhura,et al. Multi-Prize Lottery Ticket Hypothesis: Finding Accurate Binary Neural Networks by Pruning A Randomly Weighted Network , 2021, ICLR.
[60] Vikas Chandra,et al. CMSIS-NN: Efficient Neural Network Kernels for Arm Cortex-M CPUs , 2018, ArXiv.
[61] Yale Song,et al. Learning from Noisy Labels with Distillation , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[62] Jingdong Wang,et al. Distillation-Guided Residual Learning for Binary Convolutional Neural Networks , 2020, IEEE Transactions on Neural Networks and Learning Systems.
[63] Xin Dong,et al. Learning to Prune Deep Neural Networks via Layer-wise Optimal Brain Surgeon , 2017, NIPS.
[64] Alexander Finkelstein,et al. Fighting Quantization Bias With Bias , 2019, ArXiv.
[65] Jiwen Lu,et al. Learning Deep Binary Descriptor with Multi-Quantization , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[66] G. Hua,et al. LQ-Nets: Learned Quantization for Highly Accurate and Compact Deep Neural Networks , 2018, ECCV.
[67] Jinwon Lee,et al. LSQ+: Improving low-bit quantization through learnable offsets and better initialization , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).
[68] Geoffrey E. Hinton,et al. Distilling the Knowledge in a Neural Network , 2015, ArXiv.
[69] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.
[70] Xin Dong,et al. A Main/Subsidiary Network Framework for Simplifying Binary Neural Networks , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[71] Sachin S. Talathi,et al. Fixed Point Quantization of Deep Convolutional Networks , 2015, ICML.
[72] Ying Wang,et al. Bayesian Bits: Unifying Quantization and Pruning , 2020, NeurIPS.
[73] Yiran Chen,et al. BSQ: Exploring Bit-Level Sparsity for Mixed-Precision Neural Network Quantization , 2021, ICLR.
[74] James J. Little,et al. LSQ++: Lower Running Time and Higher Recall in Multi-codebook Quantization , 2018, ECCV.
[75] Zhenyu Liao,et al. Sparse Quantized Spectral Clustering , 2020, ArXiv.
[76] B.M. Oliver,et al. The Philosophy of PCM , 1948, Proceedings of the IRE.
[77] Song Han,et al. APQ: Joint Search for Network Architecture, Pruning and Quantization Policy , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[78] Ben Poole,et al. Categorical Reparameterization with Gumbel-Softmax , 2016, ICLR.
[79] Bo Chen,et al. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications , 2017, ArXiv.
[80] Xin Dong,et al. Binary Ensemble Neural Network: More Bits per Network or More Networks per Bit? , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[81] Greg Mori,et al. CLIP-Q: Deep Network Compression Learning by In-parallel Pruning-Quantization , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[82] Jeff Johnson,et al. Rethinking floating point for deep learning , 2018, ArXiv.
[83] Yuandong Tian,et al. Mixed Precision Quantization of ConvNets via Differentiable Neural Architecture Search , 2018, ArXiv.
[84] Xianglong Liu,et al. Balanced Binary Neural Networks with Gated Residual , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[85] Derek Hoiem,et al. Dreaming to Distill: Data-Free Knowledge Transfer via DeepInversion , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[86] Jian Sun,et al. Deep Learning with Low Precision by Half-Wave Gaussian Quantization , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[87] Kurt Keutzer,et al. HAWQ: Hessian AWare Quantization of Neural Networks With Mixed-Precision , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[88] Javier Duarte,et al. Ps and Qs: Quantization-Aware Pruning for Efficient Low Latency Neural Network Inference , 2021, Frontiers in Artificial Intelligence.
[89] Jungwon Lee,et al. Towards the Limit of Network Quantization , 2016, ICLR.
[90] Prad Kadambi. Comparing Fisher Information Regularization with Distillation for DNN Quantization , 2020 .
[91] Markus Nagel,et al. Data-Free Quantization Through Weight Equalization and Bias Correction , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[92] Swagath Venkataramani,et al. PACT: Parameterized Clipping Activation for Quantized Neural Networks , 2018, ArXiv.
[93] Tao Yue,et al. Distribution-aware Adaptive Multi-bit Quantization , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[94] Harri Valpola,et al. Weight-averaged consistency targets improve semi-supervised deep learning results , 2017, ArXiv.
[95] Yoshua Bengio,et al. Difference Target Propagation , 2014, ECML/PKDD.
[96] Daniel Brand,et al. Training Deep Neural Networks with 8-bit Floating Point Numbers , 2018, NeurIPS.
[97] Ling Shao,et al. TBN: Convolutional Neural Network with Ternary Inputs and Binary Weights , 2018, ECCV.
[98] Uri Weiser,et al. Post-Training Sparsity-Aware Quantization , 2021, NeurIPS.
[99] Naiyan Wang,et al. Data-Driven Sparse Structure Selection for Deep Neural Networks , 2017, ECCV.
[100] Jian Cheng,et al. From Hashing to CNNs: Training BinaryWeight Networks via Hashing , 2018, AAAI.
[101] Dhireesha Kudithipudi,et al. Cheetah: Mixed Low-Precision Hardware & Software Co-Design Framework for DNNs on the Edge , 2019, ArXiv.
[102] Bingbing Ni,et al. Performance Guaranteed Network Acceleration via High-Order Residual Quantization , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[103] Masashi Sugiyama,et al. Learning Efficient Tensor Representations with Ring-structured Networks , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[104] James T. Kwok,et al. Loss-aware Binarization of Deep Networks , 2016, ICLR.
[105] Song Han,et al. Trained Ternary Quantization , 2016, ICLR.
[106] Yang Yang,et al. BRECQ: Pushing the Limit of Post-Training Quantization by Block Reconstruction , 2021, ICLR.
[107] David Thorsley,et al. Post-training Piecewise Linear Quantization for Deep Neural Networks , 2020, ECCV.
[108] Xianglong Liu,et al. Forward and Backward Information Retention for Accurate Binary Neural Networks , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[109] P. P. Kanjilal,et al. Reduced-size neural networks through singular value decomposition and subset selection , 1993 .
[110] Zhenyu Liao,et al. AdaBits: Neural Network Quantization With Adaptive Bit-Widths , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[111] Luca Benini,et al. GAP-8: A RISC-V SoC for AI at the Edge of the IoT , 2018, 2018 IEEE 29th International Conference on Application-specific Systems, Architectures and Processors (ASAP).
[112] 知秀 柴田. 5分で分かる!? 有名論文ナナメ読み:Jacob Devlin et al. : BERT : Pre-training of Deep Bidirectional Transformers for Language Understanding , 2020 .
[113] Quoc V. Le,et al. Neural Architecture Search with Reinforcement Learning , 2016, ICLR.
[114] Yoshua Bengio,et al. FitNets: Hints for Thin Deep Nets , 2014, ICLR.
[115] Omer Levy,et al. RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.
[116] Qun Liu,et al. TernaryBERT: Distillation-aware Ultra-low Bit BERT , 2020, EMNLP.
[117] Brian Chmiel,et al. Neural gradients are near-lognormal: improved quantized and sparse training , 2020, ICLR.
[118] Bin Liu,et al. Ternary Weight Networks , 2016, ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[119] Cordelia Schmid,et al. Product Quantization for Nearest Neighbor Search , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[120] Shuchang Zhou,et al. DoReFa-Net: Training Low Bitwidth Convolutional Neural Networks with Low Bitwidth Gradients , 2016, ArXiv.
[121] Patrick Judd,et al. Integer Quantization for Deep Learning Inference: Principles and Empirical Evaluation , 2020, ArXiv.
[122] David Thorsley,et al. Near-Lossless Post-Training Quantization of Deep Neural Networks via a Piecewise Linear Approximation , 2020, ArXiv.
[123] Zhiru Zhang,et al. Improving Neural Network Quantization without Retraining using Outlier Channel Splitting , 2019, ICML.
[124] Desmond P. Taylor,et al. Is Information in the Brain Represented in Continuous or Discrete Form? , 2018, IEEE Transactions on Molecular, Biological and Multi-Scale Communications.
[125] John R. Gilbert,et al. Challenges and Advances in Parallel Sparse Matrix-Matrix Multiplication , 2008, 2008 37th International Conference on Parallel Processing.
[126] Kushal Datta,et al. Efficient 8-Bit Quantization of Transformer Neural Machine Language Translation Model , 2019, ArXiv.
[127] Anahita Bhiwandiwalla,et al. Shifted and Squeezed 8-bit Floating Point format for Low-Precision Training of Deep Neural Networks , 2020, ICLR.
[128] Larry S. Davis,et al. NISP: Pruning Networks Using Neuron Importance Score Propagation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[129] Georgios Tzimiropoulos,et al. High-Capacity Expert Binary Networks , 2020, ICLR.
[130] Ji Liu,et al. Automatic Neural Network Compression by Sparsity-Quantization Joint Learning: A Constrained Optimization-Based Approach , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[131] Hao Wu,et al. Mixed Precision Training , 2017, ICLR.
[132] Yoshua Bengio,et al. Estimating or Propagating Gradients Through Stochastic Neurons for Conditional Computation , 2013, ArXiv.
[133] Martin Rinard,et al. Efficient Exact Verification of Binarized Neural Networks , 2020, NeurIPS.
[134] Chuang Gan,et al. Once for All: Train One Network and Specialize it for Efficient Deployment , 2019, ICLR.
[135] Eunhyeok Park,et al. Weighted-Entropy-Based Quantization for Deep Neural Networks , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[136] Diana Marculescu,et al. Regularizing Activation Distribution for Training Binarized Deep Networks , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[137] Mingkui Tan,et al. Generative Low-bitwidth Data Free Quantization , 2020, ECCV.
[138] Mark Chen,et al. Language Models are Few-Shot Learners , 2020, NeurIPS.
[139] Wonyong Sung,et al. Fixed-point performance analysis of recurrent neural networks , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[140] Eriko Nurvitadhi,et al. WRPN: Wide Reduced-Precision Networks , 2017, ICLR.
[141] Pedro M. Domingos,et al. Deep Learning as a Mixed Convex-Combinatorial Optimization Problem , 2017, ICLR.
[142] W. Sheppard. On the Calculation of the most Probable Values of Frequency‐Constants, for Data arranged according to Equidistant Division of a Scale , 1897 .
[143] Jungwon Lee,et al. Learning Low Precision Deep Neural Networks through Regularization , 2018, ArXiv.
[144] Boris Flach,et al. Path Sample-Analytic Gradient Estimators for Stochastic Binary Networks , 2020, NeurIPS.
[145] Jianxin Wu,et al. ThiNet: A Filter Level Pruning Method for Deep Neural Network Compression , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[146] Maja Pantic,et al. Improved training of binary networks for human pose estimation and image recognition , 2019, ArXiv.
[147] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[148] C. Koch,et al. Is perception discrete or continuous? , 2003, Trends in Cognitive Sciences.
[149] Quoc V. Le,et al. Swish: a Self-Gated Activation Function , 2017, 1710.05941.
[150] Jack Xin,et al. Blended coarse gradient descent for full quantization of deep neural networks , 2018, Research in the Mathematical Sciences.
[151] Mouloud Belbahri,et al. BNN+: Improved Binary Network Training , 2018, ArXiv.
[152] Ray C. C. Cheung,et al. Accurate and Compact Convolutional Neural Networks with Trained Binarization , 2019, BMVC.
[153] David A. Huffman,et al. A method for the construction of minimum-redundancy codes , 1952, Proceedings of the IRE.
[154] Erich Elsen,et al. The State of Sparsity in Deep Neural Networks , 2019, ArXiv.
[155] Yoshua Bengio,et al. Neural Networks with Few Multiplications , 2015, ICLR.
[156] Hyungjun Kim,et al. BinaryDuo: Reducing Gradient Mismatch in Binary Activation Network by Coupling Binary Activations , 2020, ICLR.
[157] Max Welling,et al. Gradient 𝓁1 Regularization for Quantization Robustness , 2020, ArXiv.
[158] Michael R. Lyu,et al. BinaryBERT: Pushing the Limit of BERT Quantization , 2020, ACL.
[159] Yan Wang,et al. Fully Quantized Network for Object Detection , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[160] Dharmendra S. Modha,et al. Discovering Low-Precision Networks Close to Full-Precision Networks for Efficient Embedded Inference , 2018, ArXiv.
[161] Lothar Thiele,et al. Adaptive Loss-Aware Quantization for Multi-Bit Networks , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[162] Zhenzhi Wu,et al. GXNOR-Net: Training deep neural networks with ternary weights and activations without full-precision memory under a unified discretization framework , 2017, Neural Networks.
[163] Alec Radford,et al. Improving Language Understanding by Generative Pre-Training , 2018 .
[164] Dan Alistarh,et al. Model compression via distillation and quantization , 2018, ICLR.
[165] Paris Smaragdis,et al. Bitwise Neural Networks , 2016, ArXiv.
[166] Kurt Keutzer,et al. ZeroQ: A Novel Zero Shot Quantization Framework , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[167] Kwang-Ting Cheng,et al. Latent Weights Do Not Exist: Rethinking Binarized Neural Network Optimization , 2019, NeurIPS.
[168] Matthew Mattina,et al. Learning low-precision neural networks without Straight-Through Estimator(STE) , 2019, IJCAI.
[169] Eirikur Agustsson,et al. Universally Quantized Neural Compression , 2020, NeurIPS.
[170] Roberto Cipolla,et al. Deep Roots: Improving CNN Efficiency with Hierarchical Filter Groups , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[171] Avi Mendelson,et al. UNIQ , 2018, ACM Trans. Comput. Syst..
[172] Kurt Keutzer,et al. HAWQ-V2: Hessian Aware trace-Weighted Quantization of Neural Networks , 2020, NeurIPS.
[173] Uri Weiser,et al. Robust Quantization: One Model to Rule Them All , 2020, NeurIPS.
[174] Kush R. Varshney,et al. Decision Making With Quantized Priors Leads to Discrimination , 2017, Proceedings of the IEEE.
[175] Jonathan W. Pillow,et al. Single-trial spike trains in parietal cortex reveal discrete steps during decision-making , 2015, Science.
[176] W S McCulloch,et al. A logical calculus of the ideas immanent in nervous activity , 1990, The Philosophy of Artificial Intelligence.
[177] David L. Neuhoff,et al. Quantization , 2022, IEEE Trans. Inf. Theory.
[178] Yiming Yang,et al. DARTS: Differentiable Architecture Search , 2018, ICLR.
[179] Daisuke Miyashita,et al. Convolutional Neural Networks using Logarithmic Data Representation , 2016, ArXiv.
[180] Wei Wang,et al. Additive Powers-of-Two Quantization: An Efficient Non-uniform Discretization for Neural Networks , 2020, ICLR.
[181] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[182] Lane A. Hemaspaandra,et al. Using simulated annealing to design good codes , 1987, IEEE Trans. Inf. Theory.
[183] Niranjan Balasubramanian,et al. On the Distribution, Sparsity, and Inference-time Quantization of Attention Values in Transformers , 2021, FINDINGS.
[184] Lin Xu,et al. Incremental Network Quantization: Towards Lossless CNNs with Low-Precision Weights , 2017, ICLR.
[185] Yurong Chen,et al. Explicit Loss-Error-Aware Quantization for Low-Bit Deep Neural Networks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[186] James T. Kwok,et al. Loss-aware Weight Quantization of Deep Networks , 2018, ICLR.
[187] Kuilin Chen,et al. Incremental few-shot learning via vector quantization in deep embedded space , 2021, ICLR.
[188] Yuandong Tian,et al. FBNet: Hardware-Aware Efficient ConvNet Design via Differentiable Neural Architecture Search , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[189] Rongrong Ji,et al. Circulant Binary Convolutional Networks: Enhancing the Performance of 1-Bit DCNNs With Circulant Back Propagation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[190] Xianglong Liu,et al. Differentiable Soft Quantization: Bridging Full-Precision and Low-Bit Neural Networks , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[191] Song Han,et al. ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware , 2018, ICLR.
[192] Jian Cheng,et al. Learning Compression from Limited Unlabeled Data , 2018, ECCV.
[193] Jihwan P. Choi,et al. Data-Free Network Quantization With Adversarial Knowledge Distillation , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).
[194] Geoffrey C. Fox,et al. A deterministic annealing approach to clustering , 1990, Pattern Recognit. Lett..
[195] Seyed-Mohsen Moosavi-Dezfooli,et al. Adaptive Quantization for Deep Neural Network , 2017, AAAI.
[196] Wei Liu,et al. Bi-Real Net: Enhancing the Performance of 1-bit CNNs With Improved Representational Capability and Advanced Training Algorithm , 2018, ECCV.
[197] Quoc V. Le,et al. Searching for Activation Functions , 2018, arXiv.
[198] Soheil Ghiasi,et al. Hardware-oriented Approximation of Convolutional Neural Networks , 2016, ArXiv.
[199] Jose Javier Gonzalez Ortiz,et al. What is the State of Neural Network Pruning? , 2020, MLSys.
[200] Joe Lou,et al. Confounding Tradeoffs for Neural Network Quantization , 2021, ArXiv.
[201] Babak Hassibi,et al. Second Order Derivatives for Network Pruning: Optimal Brain Surgeon , 1992, NIPS.
[202] Elad Hoffer,et al. Scalable Methods for 8-bit Training of Neural Networks , 2018, NeurIPS.
[203] Dan Alistarh,et al. Adaptive Gradient Quantization for Data-Parallel SGD , 2020, NeurIPS.
[204] Rana Ali Amjad,et al. Up or Down? Adaptive Rounding for Post-Training Quantization , 2020, ICML.
[205] Yoni Choukroun,et al. Low-bit Quantization of Neural Networks for Efficient Inference , 2019, 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW).
[206] Soheil Ghiasi,et al. Ristretto: A Framework for Empirical Study of Resource-Efficient Inference in Convolutional Neural Networks , 2018, IEEE Transactions on Neural Networks and Learning Systems.
[207] Ilya Sutskever,et al. Language Models are Unsupervised Multitask Learners , 2019 .
[208] Christoph Meinel,et al. BoolNet: Minimizing The Energy Consumption of Binary Neural Networks , 2021, ArXiv.
[209] Ming Yang,et al. Compressing Deep Convolutional Networks using Vector Quantization , 2014, ArXiv.
[210] Seungwon Lee,et al. Quantization for Rapid Deployment of Deep Neural Networks , 2018, ArXiv.
[211] Dacheng Tao,et al. Searching for Low-Bit Weights in Quantized Neural Networks , 2020, NeurIPS.
[212] Ron Banner,et al. Improving Post Training Neural Quantization: Layer-wise Calibration and Integer Programming , 2020, ArXiv.
[213] Sergey Ioffe,et al. Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[214] Dongsoo Lee,et al. BiQGEMM: Matrix Multiplication with Lookup Table for Binary-Coding-Based Quantized DNNs , 2020, SC20: International Conference for High Performance Computing, Networking, Storage and Analysis.
[215] Weisheng Xu,et al. Fully integer-based quantization for mobile convolutional neural network inference , 2021, Neurocomputing.
[216] Pritish Narayanan,et al. Deep Learning with Limited Numerical Precision , 2015, ICML.
[217] W. R. Bennett,et al. Spectra of quantized signals , 1948, Bell Syst. Tech. J..
[218] Dongsoo Lee,et al. FleXOR: Trainable Fractional Quantization , 2020, NeurIPS.
[219] Kurt Keutzer,et al. SqueezeNext: Hardware-Aware Neural Network Design , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).
[220] Rishidev Chaudhuri,et al. Computational principles of memory , 2016, Nature Neuroscience.
[221] Xianglong Liu,et al. BiPointNet: Binary Neural Network for Point Clouds , 2020, ICLR.
[222] Michael Woodford,et al. Discrete Adjustment to a Changing Environment: Experimental Evidence , 2016, SSRN Electronic Journal.
[223] Raghuraman Krishnamoorthi,et al. Quantizing deep convolutional networks for efficient inference: A whitepaper , 2018, ArXiv.
[224] Masahiro Masuda,et al. Efficient Execution of Quantized Deep Learning Models: A Compiler Approach , 2020, ArXiv.
[225] Elad Hoffer,et al. The Knowledge Within: Methods for Data-Free Model Compression , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[226] William Equitz,et al. A new vector quantization clustering algorithm , 1989, IEEE Trans. Acoust. Speech Signal Process..
[227] Jae-Joon Han,et al. Learning to Quantize Deep Networks by Optimizing Quantization Intervals With Task Loss , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[228] Shenghuo Zhu,et al. Extremely Low Bit Neural Network: Squeeze the Last Bit Out with ADMM , 2017, AAAI.
[229] Vishnu Naresh Boddeti,et al. Local Binary Convolutional Neural Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[230] Quoc V. Le,et al. Efficient Neural Architecture Search via Parameter Sharing , 2018, ICML.
[231] Gu-Yeon Wei,et al. Structured Compression by Weight Encryption for Unstructured Pruning and Quantization , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[232] Kurt Keutzer,et al. HAWQV3: Dyadic Neural Network Quantization , 2020, ICML.
[233] Yu Bai,et al. ProxQuant: Quantized Neural Networks via Proximal Operators , 2018, ICLR.
[234] Amos J. Storkey,et al. Moonshine: Distilling with Cheap Convolutions , 2017, NeurIPS.
[235] Jack Xin,et al. Understanding Straight-Through Estimator in Training Activation Quantized Neural Nets , 2019, ICLR.
[236] Vahid Partovi Nia,et al. Adaptive Binary-Ternary Quantization , 2019, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).
[237] B. Riemann. Ueber die Darstellbarkeit einer Function durch eine trigonometrische Reihe , 1867 .
[238] Dilin Wang,et al. AttentiveNAS: Improving Neural Architecture Search via Attentive Sampling , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[239] Ran El-Yaniv,et al. Binarized Neural Networks , 2016, ArXiv.
[240] Bo Chen,et al. MnasNet: Platform-Aware Neural Architecture Search for Mobile , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[241] Frank Rosenblatt,et al. PRINCIPLES OF NEURODYNAMICS. PERCEPTRONS AND THE THEORY OF BRAIN MECHANISMS , 1963 .
[242] Luca Benini,et al. Soft-to-Hard Vector Quantization for End-to-End Learning Compressible Representations , 2017, NIPS.
[243] Ebru Arisoy,et al. Low-rank matrix factorization for Deep Neural Network training with high-dimensional output targets , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.
[244] Weifeng Zhang,et al. Simple Augmentation Goes a Long Way: ADRL for DNN Quantization , 2021, ICLR.
[245] Nicholas D. Lane,et al. Degree-Quant: Quantization-Aware Training for Graph Neural Networks , 2021, ICLR.
[246] Yoshua Bengio,et al. BinaryConnect: Training Deep Neural Networks with binary weights during propagations , 2015, NIPS.
[247] Deliang Fan,et al. Simultaneously Optimizing Weight and Quantizer of Ternary Neural Network Using Truncated Gaussian Approximation , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[248] Kurt Keutzer,et al. CoDeNet: Efficient Deployment of Input-Adaptive Object Detection on Embedded FPGAs , 2021, FPGA.
[249] Alexander Finkelstein,et al. Same, Same But Different - Recovering Neural Network Quantization Error Through Weight Factorization , 2019, ICML.
[250] Christophe Garcia,et al. Simplifying ConvNets for Fast Learning , 2012, ICANN.
[251] Gang Hua,et al. How to Train a Compact Binary Neural Network with High Accuracy? , 2017, AAAI.
[252] Nicu Sebe,et al. Binary Neural Networks: A Survey , 2020, Pattern Recognit..
[253] Xiangyu Zhang,et al. ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture Design , 2018, ECCV.
[254] Moshe Wasserblat,et al. Q8BERT: Quantized 8Bit BERT , 2019, 2019 Fifth Workshop on Energy Efficient Machine Learning and Cognitive Computing - NeurIPS Edition (EMC2-NIPS).
[255] Michael W. Mahoney,et al. Q-BERT: Hessian Based Ultra Low Precision Quantization of BERT , 2019, AAAI.
[256] Yurong Chen,et al. Network Sketching: Exploiting Binary Structure in Deep CNNs , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[257] Enhua Wu,et al. Training Binary Neural Networks through Learning with Noisy Supervision , 2020, ICML.
[258] Hongbin Zha,et al. Alternating Multi-bit Quantization for Recurrent Neural Networks , 2018, ICLR.
[259] Dipankar Das,et al. Mixed Precision Training With 8-bit Floating Point , 2019, ArXiv.
[260] Jie Lin,et al. OPQ: Compressing Deep Neural Networks with One-shot Pruning-Quantization , 2021, AAAI.
[261] Ying Wang,et al. Differentiable Joint Pruning and Quantization for Hardware Efficiency , 2020, ECCV.
[262] L. Pinneo. On noise in the nervous system. , 1966, Psychological review.
[263] Bo Chen,et al. Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[264] Kurt Keutzer,et al. I-BERT: Integer-only BERT Quantization , 2021, ICML.
[265] Nicholas D. Lane,et al. An Empirical study of Binary Neural Networks' Optimisation , 2018, ICLR.
[266] Yang Liu,et al. Two-Step Quantization for Low-bit Neural Networks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[267] C. Collin,et al. An Introduction to Natural Computation , 1998, Trends in Cognitive Sciences.
[268] Philip Heng Wai Leong,et al. SYQ: Learning Symmetric Quantization for Efficient Deep Neural Networks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[269] Sang Joon Kim,et al. A Mathematical Theory of Communication , 2006 .
[270] Vivek K. Goyal,et al. A framework for Bayesian optimality of psychophysical laws , 2012, Journal of Mathematical Psychology.
[271] Jinwoo Shin,et al. Lookahead: a Far-Sighted Alternative of Magnitude-based Pruning , 2020, ICLR.
[272] Yann LeCun,et al. Optimal Brain Damage , 1989, NIPS.
[273] Yan Lu,et al. Relational Knowledge Distillation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[274] Dacheng Tao,et al. Learning from Multiple Teacher Networks , 2017, KDD.
[275] Tom Goldstein,et al. WrapNet: Neural Net Inference with Ultra-Low-Resolution Arithmetic , 2020, ArXiv.