VerIDeep: Verifying Integrity of Deep Neural Networks through Sensitive-Sample Fingerprinting

Deep learning has become popular, and numerous cloud-based services are provided to help customers develop and deploy deep learning applications. Meanwhile, various attack techniques have also been discovered to stealthily compromise the model's integrity. When a cloud customer deploys a deep learning model in the cloud and serves it to end-users, it is important for him to be able to verify that the deployed model has not been tampered with, and the model's integrity is protected. We propose a new low-cost and self-served methodology for customers to verify that the model deployed in the cloud is intact, while having only black-box access (e.g., via APIs) to the deployed model. Customers can detect arbitrary changes to their deep learning models. Specifically, we define \texttt{Sensitive-Sample} fingerprints, which are a small set of transformed inputs that make the model outputs sensitive to the model's parameters. Even small weight changes can be clearly reflected in the model outputs, and observed by the customer. Our experiments on different types of model integrity attacks show that we can detect model integrity breaches with high accuracy ($>$99\%) and low overhead ($<$10 black-box model accesses).

[1]  Dawn Xiaodong Song,et al.  Targeted Backdoor Attacks on Deep Learning Systems Using Data Poisoning , 2017, ArXiv.

[2]  Paul Barford,et al.  Data Poisoning Attacks against Autoregressive Models , 2016, AAAI.

[3]  Gregory Valiant,et al.  Learning from untrusted data , 2016, STOC.

[4]  Xiaojin Zhu,et al.  Using Machine Teaching to Identify Optimal Training-Set Attacks on Machine Learners , 2015, AAAI.

[5]  Jörg Schwenk,et al.  All your clouds are belong to us: security analysis of cloud management interfaces , 2011, CCSW '11.

[6]  Joan Bruna,et al.  Intriguing properties of neural networks , 2013, ICLR.

[7]  Ahmad-Reza Sadeghi,et al.  AmazonIA: when elasticity snaps back , 2011, CCS '11.

[8]  Adrian Perrig,et al.  TrustVisor: Efficient TCB Reduction and Attestation , 2010, 2010 IEEE Symposium on Security and Privacy.

[9]  Seyed-Mohsen Moosavi-Dezfooli,et al.  Universal Adversarial Perturbations , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Claudia Eckert,et al.  Is Feature Selection Secure against Training Data Poisoning? , 2015, ICML.

[11]  Michael K. Reiter,et al.  HomeAlone: Co-residency Detection in the Cloud via Side-Channel Analysis , 2011, 2011 IEEE Symposium on Security and Privacy.

[12]  Song Han,et al.  Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding , 2015, ICLR.

[13]  Edgar R. Weippl,et al.  Dark Clouds on the Horizon: Using Cloud Storage as Attack Vector and Online Slack Space , 2011, USENIX Security Symposium.

[14]  Percy Liang,et al.  Understanding Black-box Predictions via Influence Functions , 2017, ICML.

[15]  Fabio Roli,et al.  Poisoning Adaptive Biometric Systems , 2012, SSPR/SPR.

[16]  Yuqiong Sun,et al.  Inevitable Failure: The Flawed Trust Assumption in the Cloud , 2014, CCSW.

[17]  Brendan Dolan-Gavitt,et al.  BadNets: Identifying Vulnerabilities in the Machine Learning Model Supply Chain , 2017, ArXiv.

[18]  Arturo Geigel,et al.  Neural network Trojan , 2013, J. Comput. Secur..

[19]  Arturo Geigel,et al.  Unsupervised Learning Trojan , 2014 .

[20]  Jonathon Shlens,et al.  Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[21]  Abhinav Srivastava,et al.  Hardening OpenStack Cloud Platforms against Compute Node Compromises , 2016, AsiaCCS.

[22]  X. Zhang,et al.  2017 Trojaning Attack on Neural Networks , 2018 .

[23]  Ananthram Swami,et al.  Practical Black-Box Attacks against Machine Learning , 2016, AsiaCCS.

[24]  Christopher D. Manning,et al.  Effective Approaches to Attention-based Neural Machine Translation , 2015, EMNLP.

[25]  Chang Liu,et al.  Robust Linear Regression Against Training Data Poisoning , 2017, AISec@CCS.

[26]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[27]  Percy Liang,et al.  Certified Defenses for Data Poisoning Attacks , 2017, NIPS.

[28]  Samy Bengio,et al.  Adversarial examples in the physical world , 2016, ICLR.

[29]  Zahra Ghodsi,et al.  SafetyNets: Verifiable Execution of Deep Neural Networks on an Untrusted Cloud , 2017, NIPS.

[30]  I. Guyon,et al.  Handwritten digit recognition: applications of neural network chips and automatic learning , 1989, IEEE Communications Magazine.

[31]  Atul Prakash,et al.  Robust Physical-World Attacks on Machine Learning Models , 2017, ArXiv.

[32]  Yuqiong Sun,et al.  Cloud Armor: Protecting Cloud Commands from Compromised Cloud Services , 2015, 2015 IEEE 8th International Conference on Cloud Computing.

[33]  Yiran Chen,et al.  Generative Poisoning Attack Method Against Neural Networks , 2017, ArXiv.

[34]  F ROSENBLATT,et al.  The perceptron: a probabilistic model for information storage and organization in the brain. , 1958, Psychological review.

[35]  H. Robbins A Stochastic Approximation Method , 1951 .

[36]  Micah Sherr,et al.  Hidden Voice Commands , 2016, USENIX Security Symposium.

[37]  Lujo Bauer,et al.  Accessorize to a Crime: Real and Stealthy Attacks on State-of-the-Art Face Recognition , 2016, CCS.

[38]  Patrick D. McDaniel,et al.  Transferability in Machine Learning: from Phenomena to Black-Box Attacks using Adversarial Samples , 2016, ArXiv.

[39]  Carlos V. Rozas,et al.  Innovative instructions and software model for isolated execution , 2013, HASP '13.

[40]  Ittai Anati,et al.  Innovative Technology for CPU Based Attestation and Sealing , 2013 .

[41]  Blaine Nelson,et al.  Poisoning Attacks against Support Vector Machines , 2012, ICML.

[42]  Dawn Song,et al.  Robust Physical-World Attacks on Deep Learning Models , 2017, 1707.08945.

[43]  Zhi Wang,et al.  HyperSafe: A Lightweight Approach to Provide Lifetime Hypervisor Control-Flow Integrity , 2010, 2010 IEEE Symposium on Security and Privacy.

[44]  Xiaojin Zhu,et al.  The Security of Latent Dirichlet Allocation , 2015, AISTATS.

[45]  Maxim Sviridenko,et al.  Approximation Algorithms for Maximum Coverage and Max Cut with Given Sizes of Parts , 1999, IPCO.

[46]  Hao Chen,et al.  MagNet: A Two-Pronged Defense against Adversarial Examples , 2017, CCS.

[47]  Fabio Roli,et al.  Towards Poisoning of Deep Learning Algorithms with Back-gradient Optimization , 2017, AISec@CCS.

[48]  Zhi Wang,et al.  HyperSentry: enabling stealthy in-context measurement of hypervisor integrity , 2010, CCS '10.

[49]  Wenyuan Xu,et al.  DolphinAttack: Inaudible Voice Commands , 2017, CCS.

[50]  Yanjun Qi,et al.  Feature Squeezing: Detecting Adversarial Examples in Deep Neural Networks , 2017, NDSS.

[51]  Ming Yang,et al.  Compressing Deep Convolutional Networks using Vector Quantization , 2014, ArXiv.

[52]  Ankur Srivastava,et al.  Neural Trojans , 2017, 2017 IEEE International Conference on Computer Design (ICCD).

[53]  Erich Elsen,et al.  Deep Speech: Scaling up end-to-end speech recognition , 2014, ArXiv.

[54]  Yoshua Bengio,et al.  Training deep neural networks with low precision multiplications , 2014 .

[55]  Geoffrey E. Hinton,et al.  Learning representations by back-propagating errors , 1986, Nature.

[56]  Wen-Zhan Song,et al.  PoTrojan: powerful neural-level trojan designs in deep learning models , 2018, ArXiv.

[57]  Ronald L. Rivest,et al.  How to tell if your cloud files are vulnerable to drive crashes , 2011, CCS '11.

[58]  Ananthram Swami,et al.  Distillation as a Defense to Adversarial Perturbations Against Deep Neural Networks , 2015, 2016 IEEE Symposium on Security and Privacy (SP).

[59]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).