Dataset Inference for Self-Supervised Models

Self-supervised models are increasingly prevalent in machine learning (ML) since they reduce the need for expensively labeled data. Because of their versatility in downstream applications, they are increasingly used as a service exposed via public APIs. At the same time, these encoder models are particularly vulnerable to model stealing attacks due to the high dimensionality of vector representations they output. Yet, encoders remain undefended: existing mitigation strategies for stealing attacks focus on supervised learning. We introduce a new dataset inference defense, which uses the private training set of the victim encoder model to attribute its ownership in the event of stealing. The intuition is that the log-likelihood of an encoder’s output representations is higher on the victim’s training data than on test data if it is stolen from the victim, but not if it is independently trained. We compute this log-likelihood using density estimation models. As part of our evaluation, we also propose measuring the fidelity of stolen encoders and quantifying the effectiveness of the theft detection without involving downstream tasks; instead, we leverage mutual information and distance measurements. Our extensive empirical results in the vision domain demonstrate that dataset inference is a promising direction for defending self-supervised models against model stealing.

[1]  Nicolas Papernot,et al.  On the Difficulty of Defending Self-Supervised Learning against Model Extraction , 2022, ICML.

[2]  Tianshuo Cong,et al.  SSLGuard: A Watermarking Scheme for Self-supervised Learning Pre-trained Encoders , 2022, CCS.

[3]  Nicolas Papernot,et al.  Increasing the Cost of Model Extraction with Calibrated Proof of Work , 2022, ICLR.

[4]  M. Backes,et al.  Can't Steal? Cont-Steal! Contrastive Stealing Attacks Against Image Encoders , 2022, 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Xiao Wang,et al.  Non-Transferable Learning: A New Approach for Model Ownership Verification and Applicability Authorization , 2021, ICLR.

[6]  Neil Zhenqiang Gong,et al.  EncoderMI: Membership Inference against Pre-trained Encoders in Contrastive Learning , 2021, CCS.

[7]  Jeff Z. HaoChen,et al.  Provable Guarantees for Self-Supervised Deep Learning with Spectral Contrastive Loss , 2021, NeurIPS.

[8]  Pratyush Maini,et al.  Dataset Inference: Ownership Resolution in Machine Learning , 2021, ICLR.

[9]  Roozbeh Mottaghi,et al.  Contrasting Contrastive Self-Supervised Representation Learning Pipelines , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[10]  Nicolas Papernot,et al.  Proof-of-Learning: Definitions and Practice , 2021, 2021 IEEE Symposium on Security and Privacy (SP).

[11]  Ilya Sutskever,et al.  Learning Transferable Visual Models From Natural Language Supervision , 2021, ICML.

[12]  Yang Zhang,et al.  Quantifying and Mitigating Privacy Risks of Contrastive Learning , 2021, CCS.

[13]  Mert Bulent Sariyildiz,et al.  Concept Generalization in Visual Representation Learning , 2020, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[14]  Xinlei Chen,et al.  Exploring Simple Siamese Representation Learning , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Franziska Boenisch,et al.  A Systematic Review on Model Watermarking for Neural Networks , 2020, Frontiers in Big Data.

[16]  Nicolas Papernot,et al.  Entangled Watermarks as a Defense against Model Extraction , 2020, USENIX Security Symposium.

[17]  Dawn Song,et al.  REFIT: A Unified Watermark Removal Framework For Deep Learning Systems With Limited Data , 2019, AsiaCCS.

[18]  Florian Kerschbaum,et al.  On the Robustness of Backdoor-based Watermarking in Deep Neural Networks , 2019, IH&MMSec.

[19]  Jean-Stanislas Denain,et al.  Grounding Representation Similarity Through Statistical Testing , 2021, NeurIPS.

[20]  A. Linear-probe,et al.  Learning Transferable Visual Models From Natural Language Supervision , 2021 .

[21]  Fillia Makedon,et al.  A Survey on Contrastive Self-supervised Learning , 2020, Technologies.

[22]  Yoav Shoham,et al.  The Cost of Training NLP Models: A Concise Overview , 2020, ArXiv.

[23]  Geoffrey E. Hinton,et al.  A Simple Framework for Contrastive Learning of Visual Representations , 2020, ICML.

[24]  Nicolas Papernot,et al.  High Accuracy and High Fidelity Extraction of Neural Networks , 2019, USENIX Security Symposium.

[25]  Mario Fritz,et al.  Prediction Poisoning: Towards Defenses Against DNN Model Stealing Attacks , 2019, ICLR.

[26]  Jaewoo Kang,et al.  BioBERT: a pre-trained biomedical language representation model for biomedical text mining , 2019, Bioinform..

[27]  Karl Stratos,et al.  Formal Limitations on the Measurement of Mutual Information , 2018, AISTATS.

[28]  R Devon Hjelm,et al.  Learning Representations by Maximizing Mutual Information Across Views , 2019, NeurIPS.

[29]  Geoffrey E. Hinton,et al.  Similarity of Neural Network Representations Revisited , 2019, ICML.

[30]  Ben Y. Zhao,et al.  Neural Cleanse: Identifying and Mitigating Backdoor Attacks in Neural Networks , 2019, 2019 IEEE Symposium on Security and Privacy (SP).

[31]  Stephan Günnemann,et al.  Failing Loudly: An Empirical Study of Methods for Detecting Dataset Shift , 2018, NeurIPS.

[32]  Samuel Marchal,et al.  PRADA: Protecting Against DNN Model Stealing Attacks , 2018, 2019 IEEE European Symposium on Security and Privacy (EuroS&P).

[33]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[34]  Oriol Vinyals,et al.  Representation Learning with Contrastive Predictive Coding , 2018, ArXiv.

[35]  Samy Bengio,et al.  Insights on representational similarity in neural networks with canonical correlation , 2018, NeurIPS.

[36]  Fan Zhang,et al.  Stealing Machine Learning Models via Prediction APIs , 2016, USENIX Security Symposium.

[37]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[38]  Honglak Lee,et al.  An Analysis of Single-Layer Networks in Unsupervised Feature Learning , 2011, AISTATS.

[39]  Andrew Y. Ng,et al.  Reading Digits in Natural Images with Unsupervised Feature Learning , 2011 .

[40]  Fei-Fei Li,et al.  ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[41]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[42]  Douglas A. Reynolds,et al.  Gaussian Mixture Models , 2018, Encyclopedia of Biometrics.

[43]  Eric Horvitz,et al.  Selective Supervision: Guiding Supervised Learning with Decision-Theoretic Active Learning , 2007, IJCAI.

[44]  Thomas M. Cover,et al.  Elements of Information Theory (Wiley Series in Telecommunications and Signal Processing) , 2006 .

[45]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[46]  Samuel B. Williams,et al.  ASSOCIATION FOR COMPUTING MACHINERY , 2000 .