Towards Understanding Model Quantization for Reliable Deep Neural Network Deployment

Deep Neural Networks (DNNs) have gained considerable attention in the past decades due to their astounding performance in different applications, such as natural language modeling, self-driving assistance, and source code understanding. With rapid exploration, more and more complex DNN architectures have been proposed along with huge pre-trained model parameters. A common way to use such DNN models in user-friendly devices (e.g., mobile phones) is to perform model compression before deployment. However, recent research has demonstrated that model compression, e.g., model quantization, yields accuracy degradation as well as output disagreements when tested on unseen data. Since the unseen data always include distribution shifts and often appear in the wild, the quality and reliability of models after quantization are not ensured. In this paper, we conduct a comprehensive study to characterize and help users understand the behaviors of quantization models. Our study considers four datasets spanning from image to text, eight DNN architectures including both feed-forward neural networks and recurrent neural networks, and 42 shifted sets with both synthetic and natural distribution shifts. The results reveal that 1) data with distribution shifts lead to more disagreements than without. 2) Quantization-aware training can produce more stable models than standard, adversarial, and Mixup training. 3) Disagreements often have closer top-1 and top-2 output probabilities, and Margin is a better indicator than other uncertainty metrics to distinguish disagreements. 4) Retraining the model with disagreements has limited efficiency in removing disagreements. We release our code and models as a new benchmark for further study of model quantization.

[1]  Xiaofei Xie,et al.  An Empirical Study on Data Distribution-Aware Test Selection for Deep Learning Enhancement , 2022, ACM Trans. Softw. Eng. Methodol..

[2]  Jitao Sang,et al.  Understanding and Testing Generalization of Deep Networks on Out-of-Distribution Data , 2021, ArXiv.

[3]  Yves Le Traon,et al.  Towards Exploring the Limitations of Active Learning: An Empirical Study , 2021, 2021 36th IEEE/ACM International Conference on Automated Software Engineering (ASE).

[4]  Paolo Tonella,et al.  DeepMetis: Augmenting a Deep Learning Test Set to Increase its Mutation Score , 2021, 2021 36th IEEE/ACM International Conference on Automated Software Engineering (ASE).

[5]  Veronika Thost,et al.  CodeNet: A Large-Scale AI for Code Dataset for Learning a Diversity of Coding Tasks , 2021, NeurIPS Datasets and Benchmarks.

[6]  Uri Weiser,et al.  Post-Training Sparsity-Aware Quantization , 2021, NeurIPS.

[7]  Matthew B. Dwyer,et al.  Distribution-Aware Testing of Neural Networks Using Generative Models , 2021, 2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE).

[8]  Yang Yang,et al.  BRECQ: Pushing the Limit of Post-Training Quantization by Block Reconstruction , 2021, ICLR.

[9]  Yiling Lou,et al.  An Empirical Study on Deployment Faults of Deep Learning Based Mobile Applications , 2021, 2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE).

[10]  Pang Wei Koh,et al.  WILDS: A Benchmark of in-the-Wild Distribution Shifts , 2020, ICML.

[11]  Tao Xie,et al.  A comprehensive study on challenges in deploying deep learning based software , 2020, ESEC/SIGSOFT FSE.

[12]  Lingming Zhang,et al.  Practical Accuracy Estimation for Efficient Deep Neural Network Testing , 2020, ACM Trans. Softw. Eng. Methodol..

[13]  Ming Zhou,et al.  GraphCodeBERT: Pre-training Code Representations with Data Flow , 2020, ICLR.

[14]  Lei Ma,et al.  Cats Are Not Fish: Deep Learning Testing Calls for Out-Of-Distribution Awareness , 2020, 2020 35th IEEE/ACM International Conference on Automated Software Engineering (ASE).

[15]  Xiang Gao,et al.  Fuzz Testing based Data Augmentation to Improve Robustness of Deep Neural Networks , 2020, 2020 IEEE/ACM 42nd International Conference on Software Engineering (ICSE).

[16]  Mark Chen,et al.  Language Models are Few-Shot Learners , 2020, NeurIPS.

[17]  Martin Vechev,et al.  Adversarial Robustness for Code , 2020, ICML.

[18]  J. Gilmer,et al.  AugMix: A Simple Data Processing Method to Improve Robustness and Uncertainty , 2019, ICLR.

[19]  Lei Ma,et al.  DeepMutation++: A Mutation Testing Framework for Deep Learning Systems , 2019, 2019 34th IEEE/ACM International Conference on Automated Software Engineering (ASE).

[20]  Michael R. Lyu,et al.  An Empirical Study of Common Challenges in Developing Deep Learning Applications , 2019, 2019 IEEE 30th International Symposium on Software Reliability Engineering (ISSRE).

[21]  Jianjun Zhao,et al.  An Empirical Study Towards Characterizing Deep Learning Development and Deployment Across Different Frameworks and Platforms , 2019, 2019 34th IEEE/ACM International Conference on Automated Software Engineering (ASE).

[22]  Haijun Wang,et al.  DiffChaser: Detecting Disagreements for Deep Neural Networks , 2019, IJCAI.

[23]  Lei Ma,et al.  DeepHunter: a coverage-guided fuzz testing framework for deep neural networks , 2019, ISSTA.

[24]  Wanxiang Che,et al.  Generating Natural Language Adversarial Examples through Probability Weighted Word Saliency , 2019, ACL.

[25]  Mark Harman,et al.  Machine Learning Testing: Survey, Landscapes and Horizons , 2019, IEEE Transactions on Software Engineering.

[26]  Xiaoxing Ma,et al.  Boosting operational DNN testing efficiency through conditioning , 2019, ESEC/SIGSOFT FSE.

[27]  Justin Gilmer,et al.  MNIST-C: A Robustness Benchmark for Computer Vision , 2019, ArXiv.

[28]  Yves Le Traon,et al.  Test Selection for Deep Learning Systems , 2019, ACM Trans. Softw. Eng. Methodol..

[29]  Thomas G. Dietterich,et al.  Benchmarking Neural Network Robustness to Common Corruptions and Perturbations , 2019, ICLR.

[30]  Yang Feng,et al.  DeepGini: prioritizing massive tests to enhance the robustness of deep neural networks , 2019, ISSTA.

[31]  Mohit Thakkar,et al.  Beginning Machine Learning in iOS , 2019, Apress.

[32]  Alberto Ferreira de Souza,et al.  Self-Driving Cars: A Survey , 2019, Expert Syst. Appl..

[33]  Lei Ma,et al.  Secure Deep Learning Engineering: A Software Quality Assurance Perspective , 2018, ArXiv.

[34]  Yue Zhao,et al.  DLFuzz: differential fuzzing testing of deep learning systems , 2018, ESEC/SIGSOFT FSE.

[35]  Shin Yoo,et al.  Guiding Deep Learning System Testing Using Surprise Adequacy , 2018, 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE).

[36]  Lior Rokach,et al.  Ensemble learning: A survey , 2018, WIREs Data Mining Knowl. Discov..

[37]  Omer Levy,et al.  code2vec: learning distributed representations of code , 2018, Proc. ACM Program. Lang..

[38]  Lei Ma,et al.  DeepGauge: Multi-Granularity Testing Criteria for Deep Learning Systems , 2018, 2018 33rd IEEE/ACM International Conference on Automated Software Engineering (ASE).

[39]  Bo Chen,et al.  Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[40]  Hongyi Zhang,et al.  mixup: Beyond Empirical Risk Minimization , 2017, ICLR.

[41]  Suman Jana,et al.  DeepTest: Automated Testing of Deep-Neural-Network-Driven Autonomous Cars , 2017, 2018 IEEE/ACM 40th International Conference on Software Engineering (ICSE).

[42]  Aleksander Madry,et al.  Towards Deep Learning Models Resistant to Adversarial Attacks , 2017, ICLR.

[43]  Kilian Q. Weinberger,et al.  Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[44]  Yuan Yu,et al.  TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[45]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[46]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[47]  Zoubin Ghahramani,et al.  Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning , 2015, ICML.

[48]  William J. Christmas,et al.  When Face Recognition Meets with Deep Learning: An Evaluation of Convolutional Neural Networks for Face Recognition , 2015, 2015 IEEE International Conference on Computer Vision Workshop (ICCVW).

[49]  Jonathon Shlens,et al.  Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[50]  Yoshua Bengio,et al.  Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling , 2014, ArXiv.

[51]  Qinghua Hu,et al.  Exploration of classification confidence in ensemble learning , 2014, Pattern Recognit..

[52]  Dan Wang,et al.  A new active labeling method for deep learning , 2014, 2014 International Joint Conference on Neural Networks (IJCNN).

[53]  Qiang Chen,et al.  Network In Network , 2013, ICLR.

[54]  Christopher Potts,et al.  Learning Word Vectors for Sentiment Analysis , 2011, ACL.

[55]  E. Todeva Networks , 2007 .

[56]  Tom Fawcett,et al.  An introduction to ROC analysis , 2006, Pattern Recognit. Lett..

[57]  S. Hochreiter,et al.  Long Short-Term Memory , 1997, Neural Computation.

[58]  Ron Banner,et al.  Accurate Post Training Quantization With Small Calibration Sets , 2021, ICML.

[59]  Yingyan Lin,et al.  Double-Win Quant: Aggressively Winning Robustness of Quantized Deep Neural Networks via Random Precision Training and Inference , 2021, ICML.

[60]  Shing-Chi Cheung,et al.  Fast Test Input Generation for Finding Deviated Behaviors in Compressed Deep Neural Network , 2021, ArXiv.

[61]  Adnan Hashmi,et al.  AIOps: Predictive Analytics & Machine Learning in Operations , 2019, Cognitive Computing Recipes.

[62]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[63]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[64]  Burr Settles,et al.  Active Learning Literature Survey , 2009 .

[65]  Sang Joon Kim,et al.  A Mathematical Theory of Communication , 2006 .

[66]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.