Semi-Leak: Membership Inference Attacks Against Semi-supervised Learning

. Semi-supervised learning (SSL) leverages both labeled and unlabeled data to train machine learning (ML) models. State-of-the-art SSL methods can achieve comparable performance to supervised learning by leveraging much fewer labeled data. However, most existing works focus on improving the performance of SSL. In this work, we take a different angle by studying the training data privacy of SSL. Specifically, we propose the first data augmentation-based membership inference attacks against ML models trained by SSL. Given a data sample and the black-box access to a model, the goal of membership inference attack is to determine whether the data sample belongs to the training dataset of the model. Our evaluation shows that the proposed attack can consistently outperform existing membership inference attacks and achieves the best performance against the model trained by SSL. Moreover, we uncover that the reason for membership leakage in SSL is different from the commonly believed one in supervised learning, i.e., overfitting (the gap between training and testing accuracy). We observe that the SSL model is well generalized to the testing data (with almost 0 overfitting) but “memorizes” the training data by giving a more confident prediction regardless of its correctness. We also explore early stopping as a countermeasure to prevent membership inference attacks against SSL. The results show that early stopping can mitigate the membership inference attack, but with the cost of model’s utility degradation. 3

[1]  Florian Tramèr,et al.  Membership Inference Attacks From First Principles , 2021, 2022 IEEE Symposium on Security and Privacy (SP).

[2]  Graham Cormode,et al.  On the Importance of Difficulty Calibration in Membership Inference Attacks , 2021, ICLR.

[3]  Tribhuvanesh Orekondy,et al.  Differential Privacy Defenses and Sampling Attacks for Membership Inference , 2021, AISec@CCS.

[4]  T. Shinozaki,et al.  FlexMatch: Boosting Semi-Supervised Learning with Curriculum Pseudo Labeling , 2021, NeurIPS.

[5]  Neil Zhenqiang Gong,et al.  EncoderMI: Membership Inference against Pre-trained Encoders in Contrastive Learning , 2021, CCS.

[6]  Michael Backes,et al.  Node-Level Membership Inference Attacks Against Graph Neural Networks , 2021, ArXiv.

[7]  Yang Zhang,et al.  Quantifying and Mitigating Privacy Risks of Contrastive Learning , 2021, CCS.

[8]  N. Gong,et al.  Practical Blind Membership Inference Attack via Differential Comparisons , 2021, NDSS.

[9]  Yang Zhang,et al.  Membership Leakage in Label-Only Exposures , 2020, CCS.

[10]  Nicolas Papernot,et al.  Label-Only Membership Inference Attacks , 2020, ICML.

[11]  Liwei Song,et al.  Systematic Evaluation of Privacy Risks of Machine Learning Models , 2020, USENIX Security Symposium.

[12]  Bruno Ribeiro,et al.  Membership Inference Attacks and Defenses in Classification Models , 2020, CODASPY.

[13]  David Berthelot,et al.  FixMatch: Simplifying Semi-Supervised Learning with Consistency and Confidence , 2020, NeurIPS.

[14]  Mario Fritz,et al.  Segmentations-Leak: Membership Inference Attacks and Defenses in Semantic Image Segmentation , 2019, ECCV.

[15]  Quoc V. Le,et al.  Randaugment: Practical automated data augmentation with a reduced search space , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[16]  Mario Fritz,et al.  GAN-Leaks: A Taxonomy of Membership Inference Attacks against Generative Models , 2019, CCS.

[17]  Matt Fredrikson,et al.  Stolen Memories: Leveraging Model Memorization for Calibrated White-Box Membership Inference , 2019, USENIX Security Symposium.

[18]  Vitaly Shmatikov,et al.  Overlearning Reveals Sensitive Attributes , 2019, ICLR.

[19]  Quoc V. Le,et al.  Unsupervised Data Augmentation for Consistency Training , 2019, NeurIPS.

[20]  N. Gong,et al.  MemGuard: Defending against Black-Box Membership Inference Attacks via Adversarial Examples , 2019, CCS.

[21]  Prateek Mittal,et al.  Privacy Risks of Securing Machine Learning Models against Adversarial Examples , 2019, CCS.

[22]  David Berthelot,et al.  MixMatch: A Holistic Approach to Semi-Supervised Learning , 2019, NeurIPS.

[23]  David Evans,et al.  Evaluating Differentially Private Machine Learning in Practice , 2019, USENIX Security Symposium.

[24]  Amir Houmansadr,et al.  Comprehensive Privacy Analysis of Deep Learning: Passive and Active White-box Inference Attacks against Centralized and Federated Learning , 2018, 2019 IEEE Symposium on Security and Privacy (SP).

[25]  Mario Fritz,et al.  ML-Leaks: Model and Data Independent Membership Inference Attacks and Defenses on Machine Learning Models , 2018, NDSS.

[26]  Shin Ishii,et al.  Virtual Adversarial Training: A Regularization Method for Supervised and Semi-Supervised Learning , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[27]  Binghui Wang,et al.  Stealing Hyperparameters in Machine Learning , 2018, 2018 IEEE Symposium on Security and Privacy (SP).

[28]  Reza Shokri,et al.  Machine Learning with Membership Privacy using Adversarial Regularization , 2018, CCS.

[29]  Seong Joon Oh,et al.  Towards Reverse-Engineering Black-Box Neural Networks , 2017, ICLR.

[30]  Somesh Jha,et al.  Privacy Risk in Machine Learning: Analyzing the Connection to Overfitting , 2017, 2018 IEEE 31st Computer Security Foundations Symposium (CSF).

[31]  Emiliano De Cristofaro,et al.  Knock Knock, Who's There? Membership Inference on Aggregate Location Data , 2017, NDSS.

[32]  Graham Neubig,et al.  Controllable Invariance through Adversarial Feature Learning , 2017, NIPS.

[33]  Vitaly Shmatikov,et al.  Membership Inference Attacks Against Machine Learning Models , 2016, 2017 IEEE Symposium on Security and Privacy (SP).

[34]  Ian Goodfellow,et al.  Deep Learning with Differential Privacy , 2016, CCS.

[35]  Nikos Komodakis,et al.  Wide Residual Networks , 2016, BMVC.

[36]  Somesh Jha,et al.  Model Inversion Attacks that Exploit Confidence Information and Basic Countermeasures , 2015, CCS.

[37]  Dong-Hyun Lee,et al.  Pseudo-Label : The Simple and Efficient Semi-Supervised Learning Method for Deep Neural Networks , 2013 .

[38]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.