Open-sourced Dataset Protection via Backdoor Watermarking

The rapid development of deep learning has benefited from the release of some high-quality open-sourced datasets ($e.g.$, ImageNet), which allows researchers to easily verify the effectiveness of their algorithms. Almost all existing open-sourced datasets require that they can only be adopted for academic or educational purposes rather than commercial purposes, whereas there is still no good way to protect them. In this paper, we propose a \emph{backdoor embedding based dataset watermarking} method to protect an open-sourced image-classification dataset by verifying whether it is used for training a third-party model. Specifically, the proposed method contains two main processes, including \emph{dataset watermarking} and \emph{dataset verification}. We adopt classical poisoning-based backdoor attacks ($e.g.$, BadNets) for dataset watermarking, ie, generating some poisoned samples by adding a certain trigger ($e.g.$, a local patch) onto some benign samples, labeled with a pre-defined target class. Based on the proposed backdoor-based watermarking, we use a hypothesis test guided method for dataset verification based on the posterior probability generated by the suspicious third-party model of the benign samples and their correspondingly watermarked samples ($i.e.$, images with trigger) on the target class. Experiments on some benchmark datasets are conducted, which verify the effectiveness of the proposed method.

[1]  Latanya Sweeney,et al.  k-Anonymity: A Model for Protecting Privacy , 2002, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[2]  Dan Zhao,et al.  A New Robust Approach for Reversible Database Watermarking with Distortion Control , 2019, IEEE Transactions on Knowledge and Data Engineering.

[3]  Ming Zhou,et al.  HIBERT: Document Level Pre-training of Hierarchical Bidirectional Transformers for Document Summarization , 2019, ACL.

[4]  Xu Tan,et al.  FastSpeech: Fast, Robust and Controllable Text to Speech , 2019, NeurIPS.

[5]  Siddharth Garg,et al.  BadNets: Evaluating Backdooring Attacks on Deep Neural Networks , 2019, IEEE Access.

[6]  Yunfei Liu,et al.  Reflection Backdoor: A Natural Backdoor Attack on Deep Neural Networks , 2020, ECCV.

[7]  Zoe L. Jiang,et al.  A Robust and Reversible Watermarking Technique for Relational Dataset Based on Clustering , 2019, 2019 18th IEEE International Conference On Trust, Security And Privacy In Computing And Communications/13th IEEE International Conference On Big Data Science And Engineering (TrustCom/BigDataSE).

[8]  Minhui Xue,et al.  Invisible Backdoor Attacks on Deep Neural Networks via Steganography and Regularization , 2019 .

[9]  Baoyuan Wu,et al.  Rethinking the Trigger of Backdoor Attack , 2020, ArXiv.

[10]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[11]  Jiguo Li,et al.  Hierarchical attribute based encryption with continuous leakage-resilience , 2019, Inf. Sci..

[12]  Kai Zhao,et al.  Protecting Trajectory From Semantic Attack Considering ${k}$ -Anonymity, ${l}$ -Diversity, and ${t}$ -Closeness , 2019, IEEE Trans. Netw. Serv. Manag..

[13]  Graham Neubig,et al.  The Return of Lexical Dependencies: Neural Lexicalized PCFGs , 2020, Transactions of the Association for Computational Linguistics.

[14]  Dong Yu,et al.  Transferring Source Style in Non-Parallel Voice Conversion , 2020, INTERSPEECH.

[15]  Chen Huang,et al.  Deep Imbalanced Learning for Face Recognition and Attribute Prediction , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  Akihiko Ohsuga,et al.  Anonymization of Sensitive Quasi-Identifiers for l-Diversity and t-Closeness , 2019, IEEE Transactions on Dependable and Secure Computing.

[17]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Chin-Chen Chang,et al.  Blockchain based searchable encryption for electronic health record sharing , 2019, Future Gener. Comput. Syst..

[19]  Larry S. Davis,et al.  ACE: Adapting to Changing Environments for Semantic Segmentation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[20]  Aleksander Madry,et al.  Adversarial Examples Are Not Bugs, They Are Features , 2019, NeurIPS.

[21]  Muttukrishnan Rajarajan,et al.  Protection of medical images and patient related information in healthcare: Using an intelligent and reversible watermarking technique , 2017, Appl. Soft Comput..

[22]  Dawn Xiaodong Song,et al.  Targeted Backdoor Attacks on Deep Learning Systems Using Data Poisoning , 2017, ArXiv.

[23]  Yicong Zhou,et al.  Cosine-transform-based chaotic system for image encryption , 2019, Inf. Sci..

[24]  Sanjeev Khudanpur,et al.  X-Vectors: Robust DNN Embeddings for Speaker Recognition , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[25]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[26]  Johannes Stallkamp,et al.  Man vs. computer: Benchmarking machine learning algorithms for traffic sign recognition , 2012, Neural Networks.

[27]  Yong Jiang,et al.  Backdoor Learning: A Survey , 2020, IEEE transactions on neural networks and learning systems.

[28]  Ankur Srivastava,et al.  A Survey on Neural Trojans , 2020, 2020 21st International Symposium on Quality Electronic Design (ISQED).

[29]  DAVID G. KENDALL,et al.  Introduction to Mathematical Statistics , 1947, Nature.

[30]  Vitaly Shmatikov,et al.  How To Backdoor Federated Learning , 2018, AISTATS.

[31]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[32]  F. Wilcoxon Individual Comparisons by Ranking Methods , 1945 .