DAWN: Dynamic Adversarial Watermarking of Neural Networks

Training machine learning (ML) models is expensive in terms of computational power, amounts of labeled data and human expertise. Thus, ML models constitute business value for their owners. Embedding digital watermarks during model training allows a model owner to later identify their models in case of theft or misuse. However, model functionality can also be stolen via model extraction, where an adversary trains a surrogate model using results returned from a prediction API of the original model. Recent work has shown that model extraction is a realistic threat. Existing watermarking schemes are ineffective against model extraction since it is the adversary who trains the surrogate model. In this paper, we introduce DAWN (Dynamic Adversarial Watermarking of Neural Networks), the first approach to use watermarking to deter model extraction theft. Unlike prior watermarking schemes, DAWN does not impose changes to the training process but operates at the prediction API of the protected model, by dynamically changing the responses for a small subset of queries (e.g., 0.5%) from API clients. This set is a watermark that will be embedded in case a client uses its queries to train a surrogate model. We show that DAWN is resilient against two state-of-the-art model extraction attacks, effectively watermarking all extracted surrogate models, allowing model owners to reliably demonstrate ownership (with confidence greater than 1-2-64), incurring negligible loss of prediction accuracy (0.03-0.5%).

[1]  R. A. Fisher,et al.  Statistical Tables for Biological, Agricultural and Medical Research , 1956 .

[2]  F. Yates,et al.  Statistical Tables for Biological, Agricultural and Medical Research. , 1939 .

[3]  J. Wishart Statistical tables , 2018, Global Education Monitoring Report.

[4]  P. Rousseeuw Silhouettes: a graphical aid to the interpretation and validation of cluster analysis , 1987 .

[5]  Anders Krogh,et al.  A Simple Weight Decay Can Improve Generalization , 1991, NIPS.

[6]  Markus G. Kuhn,et al.  Information hiding-a survey , 1999, Proc. IEEE.

[7]  Rich Caruana,et al.  Overfitting in Neural Nets: Backpropagation, Conjugate Gradient, and Early Stopping , 2000, NIPS.

[8]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[9]  G. Griffin,et al.  Caltech-256 Object Category Dataset , 2007 .

[10]  Nasser M. Nasrabadi,et al.  Pattern Recognition and Machine Learning , 2006, Technometrics.

[11]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[12]  Lakshminarayanan Subramanian,et al.  Sybil-Resilient Online Content Voting , 2009, NSDI.

[13]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[14]  Johannes Stallkamp,et al.  The German Traffic Sign Recognition Benchmark: A multi-class classification competition , 2011, The 2011 International Joint Conference on Neural Networks.

[15]  Blaine Nelson,et al.  Poisoning Attacks against Support Vector Machines , 2012, ICML.

[16]  Gang Wang,et al.  Northeastern University , 2021, IEEE Pulse.

[17]  N. Asokan,et al.  The Untapped Potential of Trusted Execution Environments on Mobile Devices , 2013, IEEE Security & Privacy.

[18]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[19]  Gianluca Stringhini,et al.  EVILCOHORT: Detecting Communities of Malicious Accounts on Online Services , 2015, USENIX Security Symposium.

[20]  Jonathon Shlens,et al.  Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[21]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Fan Zhang,et al.  Stealing Machine Learning Models via Prediction APIs , 2016, USENIX Security Symposium.

[23]  Fabio Roli,et al.  Towards Poisoning of Deep Learning Algorithms with Back-gradient Optimization , 2017, AISec@CCS.

[24]  Ananthram Swami,et al.  Practical Black-Box Attacks against Machine Learning , 2016, AsiaCCS.

[25]  Samy Bengio,et al.  Understanding deep learning requires rethinking generalization , 2016, ICLR.

[26]  Kilian Q. Weinberger,et al.  Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Brendan Dolan-Gavitt,et al.  BadNets: Identifying Vulnerabilities in the Machine Learning Model Supply Chain , 2017, ArXiv.

[28]  Hao Chen,et al.  MagNet: A Two-Pronged Defense against Adversarial Examples , 2017, CCS.

[29]  Shin'ichi Satoh,et al.  Embedding Watermarks into Deep Neural Networks , 2017, ICMR.

[30]  Dawn Xiaodong Song,et al.  Targeted Backdoor Attacks on Deep Learning Systems Using Data Poisoning , 2017, ArXiv.

[31]  Wen-Chuan Lee,et al.  Trojaning Attack on Neural Networks , 2018, NDSS.

[32]  Vijay Arya,et al.  Model Extraction Warning in MLaaS Paradigm , 2017, ACSAC.

[33]  Benjamin Edwards,et al.  Defending Against Machine Learning Model Stealing Attacks Using Deceptive Perturbations , 2018 .

[34]  Farinaz Koushanfar,et al.  DeepMarks: A Digital Fingerprinting Framework for Deep Neural Networks , 2018, IACR Cryptol. ePrint Arch..

[35]  Benjamin Edwards,et al.  Defending Against Model Stealing Attacks Using Deceptive Perturbations , 2018, ArXiv.

[36]  Aleksander Madry,et al.  Towards Deep Learning Models Resistant to Adversarial Attacks , 2017, ICLR.

[37]  Hui Wu,et al.  Protecting Intellectual Property of Deep Neural Networks with Watermarking , 2018, AsiaCCS.

[38]  Benny Pinkas,et al.  Turning Your Weakness Into a Strength: Watermarking Deep Neural Networks by Backdooring , 2018, USENIX Security Symposium.

[39]  Alberto Ferreira de Souza,et al.  Copycat CNN: Stealing Knowledge by Persuading Confession with Random Non-Labeled Data , 2018, 2018 International Joint Conference on Neural Networks (IJCNN).

[40]  Konrad Rieck,et al.  Forgotten Siblings: Unifying Attacks on Machine Learning and Digital Watermarking , 2018, 2018 IEEE European Symposium on Security and Privacy (EuroS&P).

[41]  Wenbo Guo,et al.  TABOR: A Highly Accurate Approach to Inspecting and Restoring Trojan Backdoors in AI Systems , 2019, ArXiv.

[42]  Tribhuvanesh Orekondy,et al.  Knockoff Nets: Stealing Functionality of Black-Box Models , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[43]  Ben Y. Zhao,et al.  Persistent and Unforgeable Watermarks for Deep Neural Networks , 2019, ArXiv.

[44]  Ben Y. Zhao,et al.  Neural Cleanse: Identifying and Mitigating Backdoor Attacks in Neural Networks , 2019, 2019 IEEE Symposium on Security and Privacy (SP).

[45]  Samuel Marchal,et al.  PRADA: Protecting Against DNN Model Stealing Attacks , 2018, 2019 IEEE European Symposium on Security and Privacy (EuroS&P).

[46]  Benjamin Edwards,et al.  Detecting Backdoor Attacks on Deep Neural Networks by Activation Clustering , 2018, SafeAI@AAAI.

[47]  Erwan Le Merrer,et al.  Adversarial frontier stitching for remote neural network watermarking , 2017, Neural Computing and Applications.

[48]  Quoc V. Le,et al.  Do Better ImageNet Models Transfer Better? , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[49]  Chengfang Fang,et al.  BDPL: A Boundary Differentially Private Layer Against Machine Learning Model Extraction Attacks , 2019, ESORICS.

[50]  Farinaz Koushanfar,et al.  DeepSigns: An End-to-End Watermarking Framework for Ownership Protection of Deep Neural Networks , 2019, ASPLOS.

[51]  J. Zico Kolter,et al.  Certified Adversarial Robustness via Randomized Smoothing , 2019, ICML.

[52]  Vinod Ganapathy,et al.  ActiveThief: Model Extraction Using Active Learning and Unannotated Public Data , 2020, AAAI.

[53]  Mario Fritz,et al.  Prediction Poisoning: Towards Defenses Against DNN Model Stealing Attacks , 2019, ICLR.

[54]  Nicolas Papernot,et al.  Entangled Watermarks as a Defense against Model Extraction , 2020, USENIX Security Symposium.