Leveraging Unlabeled Data for Watermark Removal of Deep Neural Networks

Deep neural networks have achieved tremendous success in various fields; however, training these models from scratch could be computationally expensive and requires a lot of training data. Therefore, recent work has explored different watermarking techniques to protect the pre-trained deep neural networks from potential copyright infringements. Although several existing techniques could effectively embed such watermarks into the DNNs, they could be vulnerable to adversaries who aim at removing the watermarks. In this work, we demonstrate that a carefullydesigned fine-tuning method enables the adversary with limited training data to effectively remove the watermarks, without compromising the model functionality. In particular, leveraging auxiliary unlabeled data significantly decreases the amount of labeled training data needed for effective watermark removal, even if the unlabeled data samples are not drawn from the same distribution as the benign data for model evaluation.

[1]  Yoshua Bengio,et al.  An Empirical Investigation of Catastrophic Forgeting in Gradient-Based Neural Networks , 2013, ICLR.

[2]  Yoshua Bengio,et al.  How transferable are features in deep neural networks? , 2014, NIPS.

[3]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Brendan Dolan-Gavitt,et al.  BadNets: Identifying Vulnerabilities in the Machine Learning Model Supply Chain , 2017, ArXiv.

[5]  Shin'ichi Satoh,et al.  Embedding Watermarks into Deep Neural Networks , 2017, ICMR.

[6]  Ankur Srivastava,et al.  Neural Trojans , 2017, 2017 IEEE International Conference on Computer Design (ICCD).

[7]  Dawn Xiaodong Song,et al.  Targeted Backdoor Attacks on Deep Learning Systems Using Data Poisoning , 2017, ArXiv.

[8]  Farinaz Koushanfar,et al.  DeepSigns: A Generic Watermarking Framework for IP Protection of Deep Learning Models , 2018, IACR Cryptol. ePrint Arch..

[9]  Wen-Chuan Lee,et al.  Trojaning Attack on Neural Networks , 2018, NDSS.

[10]  Ronald Kemker,et al.  Measuring Catastrophic Forgetting in Neural Networks , 2017, AAAI.

[11]  Jerry Li,et al.  Spectral Signatures in Backdoor Attacks , 2018, NeurIPS.

[12]  Brendan Dolan-Gavitt,et al.  Fine-Pruning: Defending Against Backdooring Attacks on Deep Neural Networks , 2018, RAID.

[13]  Hui Wu,et al.  Protecting Intellectual Property of Deep Neural Networks with Watermarking , 2018, AsiaCCS.

[14]  Benny Pinkas,et al.  Turning Your Weakness Into a Strength: Watermarking Deep Neural Networks by Backdooring , 2018, USENIX Security Symposium.

[15]  Ben Y. Zhao,et al.  Neural Cleanse: Identifying and Mitigating Backdoor Attacks in Neural Networks , 2019, 2019 IEEE Symposium on Security and Privacy (SP).

[16]  Benjamin Edwards,et al.  Detecting Backdoor Attacks on Deep Neural Networks by Activation Clustering , 2018, SafeAI@AAAI.

[17]  Damith Chinthana Ranasinghe,et al.  STRIP: a defence against trojan attacks on deep neural networks , 2019, ACSAC.