Adversarial frontier stitching for remote neural network watermarking

The state-of-the-art performance of deep learning models comes at a high cost for companies and institutions, due to the tedious data collection and the heavy processing requirements. Recently, Nagai et al. (Int J Multimed Inf Retr 7(1):3–16, 2018), Uchida et al. (Embedding watermarks into deep neural networks, ICMR, 2017) proposed to watermark convolutional neural networks for image classification, by embedding information into their weights. While this is a clear progress toward model protection, this technique solely allows for extracting the watermark from a network that one accesses locally and entirely. Instead, we aim at allowing the extraction of the watermark from a neural network (or any other machine learning model) that is operated remotely , and available through a service API. To this end, we propose to mark the model’s action itself, tweaking slightly its decision frontiers so that a set of specific queries convey the desired information. In the present paper, we formally introduce the problem and propose a novel zero-bit watermarking algorithm that makes use of adversarial model examples . While limiting the loss of performance of the protected model, this algorithm allows subsequent extraction of the watermark using only few queries. We experimented the approach on three neural networks designed for image classification, in the context of MNIST digit recognition task.

[1]  Binghui Wang,et al.  Stealing Hyperparameters in Machine Learning , 2018, 2018 IEEE Symposium on Security and Privacy (SP).

[2]  Fan Zhang,et al.  Stealing Machine Learning Models via Prediction APIs , 2016, USENIX Security Symposium.

[3]  Miodrag Potkonjak,et al.  Watermarking Deep Neural Networks for Embedded Systems , 2018, 2018 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).

[4]  Pan He,et al.  Adversarial Examples: Attacks and Defenses for Deep Learning , 2017, IEEE Transactions on Neural Networks and Learning Systems.

[5]  Mehmed M. Kantardzic,et al.  Data Driven Exploratory Attacks on Black Box Classifiers in Adversarial Domains , 2017, Neurocomputing.

[6]  Yann LeCun,et al.  The mnist database of handwritten digits , 2005 .

[7]  C. F. Osborne,et al.  A digital watermark , 1994, Proceedings of 1st International Conference on Image Processing.

[8]  Farinaz Koushanfar,et al.  DeepSigns: A Generic Watermarking Framework for IP Protection of Deep Learning Models , 2018, IACR Cryptol. ePrint Arch..

[9]  Ben Y. Zhao,et al.  Towards Graph Watermarks , 2015, COSN.

[10]  Ewout van den Berg,et al.  Some Insights into the Geometry and Training of Neural Networks , 2016, ArXiv.

[11]  Patrick D. McDaniel,et al.  On the (Statistical) Detection of Adversarial Examples , 2017, ArXiv.

[12]  Martín Abadi,et al.  TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems , 2016, ArXiv.

[13]  Frank Hartung,et al.  Multimedia watermarking techniques , 1999, Proc. IEEE.

[14]  Chuan-Yu Chang,et al.  A neural-network-based robust watermarking scheme , 2005, 2005 IEEE International Conference on Systems, Man and Cybernetics.

[15]  Shin'ichi Satoh,et al.  Embedding Watermarks into Deep Neural Networks , 2017, ICMR.

[16]  Benny Pinkas,et al.  Turning Your Weakness Into a Strength: Watermarking Deep Neural Networks by Backdooring , 2018, USENIX Security Symposium.

[17]  Geoffrey E. Hinton,et al.  A Simple Way to Initialize Recurrent Networks of Rectified Linear Units , 2015, ArXiv.

[18]  Patrick D. McDaniel,et al.  Cleverhans V0.1: an Adversarial Machine Learning Library , 2016, ArXiv.

[19]  Erwan Le Merrer,et al.  TamperNN: Efficient Tampering Detection of Deployed Neural Nets , 2019, 2019 IEEE 30th International Symposium on Software Reliability Engineering (ISSRE).

[20]  Amit K. Roy-Chowdhury,et al.  Adversarial Perturbations Against Real-Time Video Classification Systems , 2018, NDSS.

[21]  Shin'ichi Satoh,et al.  Digital watermarking for deep neural networks , 2018, International Journal of Multimedia Information Retrieval.

[22]  Terrance E. Boult,et al.  Are Accuracy and Robustness Correlated , 2016, 2016 15th IEEE International Conference on Machine Learning and Applications (ICMLA).

[23]  Seong Joon Oh,et al.  Towards Reverse-Engineering Black-Box Neural Networks , 2017, ICLR.

[24]  Dan Boneh,et al.  Ensemble Adversarial Training: Attacks and Defenses , 2017, ICLR.

[25]  Ananthram Swami,et al.  Practical Black-Box Attacks against Machine Learning , 2016, AsiaCCS.

[26]  Hui Wu,et al.  Protecting Intellectual Property of Deep Neural Networks with Watermarking , 2018, AsiaCCS.

[27]  David A. Wagner,et al.  Audio Adversarial Examples: Targeted Attacks on Speech-to-Text , 2018, 2018 IEEE Security and Privacy Workshops (SPW).

[28]  Tom Goldstein,et al.  Are adversarial examples inevitable? , 2018, ICLR.

[29]  Ronald M. Summers,et al.  Deep Convolutional Neural Networks for Computer-Aided Detection: CNN Architectures, Dataset Characteristics and Transfer Learning , 2016, IEEE Transactions on Medical Imaging.

[30]  Jonathon Shlens,et al.  Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[31]  Seyed-Mohsen Moosavi-Dezfooli,et al.  Universal Adversarial Perturbations , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Ananthram Swami,et al.  The Limitations of Deep Learning in Adversarial Settings , 2015, 2016 IEEE European Symposium on Security and Privacy (EuroS&P).

[33]  Nick Antonopoulos,et al.  An Empirical Evaluation of Adversarial Robustness under Transfer Learning , 2019, ArXiv.

[34]  Wen-Chuan Lee,et al.  Trojaning Attack on Neural Networks , 2018, NDSS.

[35]  Valentina Emilia Balas,et al.  Stealing Neural Networks via Timing Side Channels , 2018, ArXiv.