Deep Reinforcement Learning-based Text Anonymization against Private-Attribute Inference

User-generated textual data is rich in content and has been used in many user behavioral modeling tasks. However, it could also leak user private-attribute information that they may not want to disclose such as age and location. User’s privacy concerns mandate data publishers to protect privacy. One effective way is to anonymize the textual data. In this paper, we study the problem of textual data anonymization and propose a novel Reinforcement Learning-based Text Anonymizor, RLTA, which addresses the problem of private-attribute leakage while preserving the utility of textual data. Our approach first extracts a latent representation of the original text w.r.t. a given task, then leverages deep reinforcement learning to automatically learn an optimal strategy for manipulating text representations w.r.t. the received privacy and utility feedback. Experiments show the effectiveness of this approach in terms of preserving both privacy and utility.

[1]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[2]  Gökhan Tür,et al.  Sanitization and Anonymization of Document Repositories , 2009, Database Technologies: Concepts, Methodologies, Tools, and Applications.

[3]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[4]  Cynthia Dwork,et al.  Calibrating Noise to Sensitivity in Private Data Analysis , 2006, TCC.

[5]  Ghazaleh Beigi,et al.  Privacy Preserving Text Representation Learning , 2019, HT.

[6]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[7]  Timothy Baldwin,et al.  Towards Robust and Privacy-preserving Text Representations , 2018, ACL.

[8]  Alex Graves,et al.  Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[9]  Abdur Chowdhury,et al.  A picture of search , 2006, InfoScale '06.

[10]  Arjun Mukherjee,et al.  Improving Gender Classification of Blog Authors , 2010, EMNLP.

[11]  Yanchao Zhang,et al.  Privacy-Preserving Social Media Data Outsourcing , 2018, IEEE INFOCOM 2018 - IEEE Conference on Computer Communications.

[12]  Christopher D. Manning,et al.  Effective Approaches to Attention-based Neural Machine Translation , 2015, EMNLP.

[13]  Svitlana Volkova,et al.  Inferring Latent User Properties from Texts Published in Social Media , 2015, AAAI.

[14]  Marc'Aurelio Ranzato,et al.  Sequence Level Training with Recurrent Neural Networks , 2015, ICLR.

[15]  Balamurugan Anandan,et al.  t-Plausibility: Generalizing Words to Desensitize Text , 2012, Trans. Data Priv..

[16]  Reza Zafarani,et al.  Social Media Mining: An Introduction , 2014 .

[17]  Richard Socher,et al.  A Deep Reinforced Model for Abstractive Summarization , 2017, ICLR.

[18]  Dirk Hovy,et al.  User Review Sites as a Resource for Large-Scale Sociolinguistic Studies , 2015, WWW.

[19]  Ghazaleh Beigi,et al.  I Am Not What I Write: Privacy Preserving Text Representation Learning , 2019, ArXiv.

[20]  David Silver,et al.  Deep Reinforcement Learning with Double Q-Learning , 2015, AAAI.

[21]  Andrew W. Moore,et al.  Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[22]  Shashi Narayan,et al.  Privacy-preserving Neural Representations of Text , 2018, EMNLP.

[23]  Timothy Cribbin,et al.  An Interactive Method for Inferring Demographic Attributes in Twitter , 2015, HT.

[24]  Xuanjing Huang,et al.  Toward Diverse Text Generation with Inverse Reinforcement Learning , 2018, IJCAI.

[25]  Yi Zhang,et al.  Conversational Recommender System , 2018, SIGIR.

[26]  Giuseppe Ateniese,et al.  Deep Models Under the GAN: Information Leakage from Collaborative Deep Learning , 2017, CCS.

[27]  Timothy Baldwin,et al.  langid.py: An Off-the-shelf Language Identification Tool , 2012, ACL.

[28]  Bowen Zhou,et al.  Abstractive Text Summarization using Sequence-to-sequence RNNs and Beyond , 2016, CoNLL.

[29]  Hang Li,et al.  Neural Responding Machine for Short-Text Conversation , 2015, ACL.

[30]  G. Lewicki,et al.  Approximation by Superpositions of a Sigmoidal Function , 2003 .

[31]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[32]  Man Lan,et al.  A comparative study on term weighting schemes for text categorization , 2005, Proceedings. 2005 IEEE International Joint Conference on Neural Networks, 2005..