Attacks on Digital Watermarks for Deep Neural Networks

Training deep neural networks is a computationally expensive task. Furthermore, models are often derived from proprietary datasets that have been carefully prepared and labelled. Hence, creators of deep learning models want to protect their models against intellectual property theft. However, this is not always possible, since the model may, e.g., be embedded in a mobile app for fast response times. As a countermeasure watermarks for deep neural networks have been developed that embed secret information into the model. This information can later be retrieved by the creator to prove ownership. Uchida et al. proposed the first such watermarking method. The advantage of their scheme is that it does not compromise the accuracy of the model prediction. However, in this paper we show that their technique modifies the statistical distribution of the model. Using this modification we can not only detect the presence of a watermark, but even derive its embedding length and use this information to remove the watermark by overwriting it. We show analytically that our detection algorithm follows consequentially from their embedding algorithm and propose a possible countermeasure. Our findings shall help to refine the definition of undetectability of watermarks for deep neural networks.

[1]  Nikos Komodakis,et al.  Wide Residual Networks , 2016, BMVC.

[2]  Jiri Fridrich,et al.  Image watermarking for tamper detection , 1998, Proceedings 1998 International Conference on Image Processing. ICIP98 (Cat. No.98CB36269).

[3]  Stefan Katzenbeisser,et al.  Private Evaluation of Decision Trees using Sublinear Cost , 2019, Proc. Priv. Enhancing Technol..

[4]  Fan Zhang,et al.  Stealing Machine Learning Models via Prediction APIs , 2016, USENIX Security Symposium.

[5]  Michael Naehrig,et al.  CryptoNets: applying neural networks to encrypted data with high throughput and accuracy , 2016, ICML 2016.

[6]  Shin'ichi Satoh,et al.  Embedding Watermarks into Deep Neural Networks , 2017, ICMR.

[7]  Nasir Memon,et al.  Secret and public key image watermarking schemes for image authentication and ownership verification , 2001, IEEE Trans. Image Process..

[8]  Anders Krogh,et al.  A Simple Weight Decay Can Improve Generalization , 1991, NIPS.

[9]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Hui Wu,et al.  Protecting Intellectual Property of Deep Neural Networks with Watermarking , 2018, AsiaCCS.

[11]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .