Privacy-aware supervised classification: An informative subspace based multi-objective approach

Abstract Sharing the raw or an abstract representation of a labelled dataset on cloud platforms can potentially expose sensitive information of the data to an adversary, e.g., in the case of an emotion classification task from text, an adversary-agnostic abstract representation of the text data may eventually lead an adversary to identify the demographics of the authors, such as their gender and age. In this paper, we propose a universal defense mechanism against such malicious attempts of stealing sensitive information from data shared on cloud platforms. More specifically, our proposed method employs an informative subspace based multi-objective approach to obtain a sensitive information aware encoding of the data representation. A number of experiments conducted on both standard text and image datasets demonstrate that our proposed approach is able to reduce the effectiveness of the adversarial task (i.e., in other words is able to better protect the sensitive information of the data) without significantly reducing the effectiveness of the primary task itself.

[1]  Jun Zhu,et al.  Boosting Adversarial Attacks with Momentum , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[2]  Harald Kittler,et al.  Descriptor : The HAM 10000 dataset , a large collection of multi-source dermatoscopic images of common pigmented skin lesions , 2018 .

[3]  Dae-Won Kim,et al.  Pairwise dependence-based unsupervised feature selection , 2021, Pattern Recognit..

[4]  Christos Dimitrakakis,et al.  Differential Privacy for Bayesian Inference through Posterior Sampling , 2017, J. Mach. Learn. Res..

[5]  Miriam A. M. Capretz,et al.  MLaaS: Machine Learning as a Service , 2015, 2015 IEEE 14th International Conference on Machine Learning and Applications (ICMLA).

[6]  Ujjwal Bhattacharya,et al.  Privacy Preserving Approximate K-means Clustering , 2019, CIKM.

[7]  Nikolaos Aletras,et al.  An analysis of the user occupational class through Twitter content , 2015, ACL.

[8]  Wei Jiang,et al.  Privacy-Preserving Multi-task Learning , 2018, 2018 IEEE International Conference on Data Mining (ICDM).

[9]  Daniel Sheldon,et al.  Differentially Private Bayesian Linear Regression , 2019, NeurIPS.

[10]  Ankit Thakkar,et al.  Survey on handwriting-based personality trait identification , 2019, Expert Syst. Appl..

[11]  Jonathon Shlens,et al.  Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[12]  Ananthram Swami,et al.  Practical Black-Box Attacks against Machine Learning , 2016, AsiaCCS.

[13]  Ben Poole,et al.  Categorical Reparameterization with Gumbel-Softmax , 2016, ICLR.

[14]  Daniel C. Castro,et al.  Morpho-MNIST: Quantitative Assessment and Diagnostics for Representation Learning , 2018, J. Mach. Learn. Res..

[15]  Yoav Goldberg,et al.  Adversarial Removal of Demographic Attributes from Text Data , 2018, EMNLP.

[16]  Quoc V. Le,et al.  Distributed Representations of Sentences and Documents , 2014, ICML.

[17]  Sergey Ioffe,et al.  Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning , 2016, AAAI.

[18]  Kaiqi Huang,et al.  Universal adversarial perturbations against object detection , 2021, Pattern Recognit..

[19]  Joan Bruna,et al.  Intriguing properties of neural networks , 2013, ICLR.

[20]  Ananthram Swami,et al.  Distillation as a Defense to Adversarial Perturbations Against Deep Neural Networks , 2015, 2016 IEEE Symposium on Security and Privacy (SP).

[21]  Dirk Hovy,et al.  User Review Sites as a Resource for Large-Scale Sociolinguistic Studies , 2015, WWW.

[22]  Xiaohui Kuang,et al.  Adaptive iterative attack towards explainable adversarial robustness , 2020, Pattern Recognit..

[23]  Debasis Ganguly,et al.  Towards Socially Responsible AI: Cognitive Bias-Aware Multi-Objective Learning , 2020, AAAI.

[24]  Shashi Narayan,et al.  Privacy-preserving Neural Representations of Text , 2018, EMNLP.

[25]  Samy Bengio,et al.  Adversarial examples in the physical world , 2016, ICLR.

[26]  Le Song,et al.  Learning to Explain: An Information-Theoretic Perspective on Model Interpretation , 2018, ICML.

[27]  Liang Du,et al.  Unsupervised feature selection with adaptive multiple graph learning , 2020, Pattern Recognit..

[28]  Benjamin C. M. Fung,et al.  Differentially private data publishing for arbitrarily partitioned data , 2021, Inf. Sci..

[29]  Florian Kerschbaum,et al.  SynTF: Synthetic and Differentially Private Term Frequency Vectors for Privacy-Preserving Text Mining , 2018, SIGIR.

[30]  Sara Rosenthal,et al.  Age Prediction in Blogs: A Study of Style, Content, and Online Behavior in Pre- and Post-Social Media Generations , 2011, ACL.

[31]  Cynthia Dwork,et al.  Differential Privacy , 2006, ICALP.

[32]  Timothy Baldwin,et al.  Towards Robust and Privacy-preserving Text Representations , 2018, ACL.

[33]  Aram Galstyan,et al.  Variational Information Maximization for Feature Selection , 2016, NIPS.

[34]  Jun Wang,et al.  Privacy and Regression Model Preserved Learning , 2014, AAAI.

[35]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[36]  Scott Lundberg,et al.  A Unified Approach to Interpreting Model Predictions , 2017, NIPS.