Obfuscation for Privacy-preserving Syntactic Parsing

The goal of homomorphic encryption is to encrypt data such that another party can operate on it without being explicitly exposed to the content of the original data. We introduce an idea for a privacy-preserving transformation on natural language data, inspired by homomorphic encryption. Our primary tool is {\em obfuscation}, relying on the properties of natural language. Specifically, a given English text is obfuscated using a neural model that aims to preserve the syntactic relationships of the original sentence so that the obfuscated sentence can be parsed instead of the original one. The model works at the word level, and learns to obfuscate each word separately by changing it into a new word that has a similar syntactic role. The text obfuscated by our model leads to better performance on three syntactic parsers (two dependency and one constituency parsers) in comparison to an upper-bound random substitution baseline. More specifically, the results demonstrate that as more terms are obfuscated (by their part of speech), the substitution upper bound significantly degrades, while the neural model maintains a relatively high performing parser. All of this is done without much sacrifice of privacy compared to the random substitution upper bound. We also further analyze the results, and discover that the substituted words have similar syntactic properties, but different semantic content, compared to the original words.

[1]  Yoav Goldberg,et al.  Adversarial Removal of Demographic Attributes from Text Data , 2018, EMNLP.

[2]  Zoubin Ghahramani,et al.  A Theoretically Grounded Application of Dropout in Recurrent Neural Networks , 2015, NIPS.

[3]  Daan Wierstra,et al.  Stochastic Backpropagation and Approximate Inference in Deep Generative Models , 2014, ICML.

[4]  Moti Yung,et al.  Non-interactive cryptocomputing for NC/sup 1/ , 1999, 40th Annual Symposium on Foundations of Computer Science (Cat. No.99CB37039).

[5]  Hassan Takabi,et al.  CryptoDL: Deep Neural Networks over Encrypted Data , 2017, ArXiv.

[6]  Yee Whye Teh,et al.  The Concrete Distribution: A Continuous Relaxation of Discrete Random Variables , 2016, ICLR.

[7]  Ben Poole,et al.  Categorical Reparameterization with Gumbel-Softmax , 2016, ICLR.

[8]  Dan Boneh,et al.  Evaluating 2-DNF Formulas on Ciphertexts , 2005, TCC.

[9]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[10]  Annabelle McIver,et al.  Generalised Differential Privacy for Text Document Processing , 2018, POST.

[11]  Kevin Knight,et al.  Obfuscating Gender in Social Media Writing , 2016, NLP+CSS@EMNLP.

[12]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[13]  Bhiksha Raj,et al.  Privacy-Preserving Multi-Document Summarization , 2015, ArXiv.

[14]  Hao Chen,et al.  CHET: Compiler and Runtime for Homomorphic Evaluation of Tensor Programs , 2018, ArXiv.

[15]  Jung Hee Cheon,et al.  Homomorphic Encryption for Arithmetic of Approximate Numbers , 2017, ASIACRYPT.

[16]  Shashi Narayan,et al.  Privacy-preserving Neural Representations of Text , 2018, EMNLP.

[17]  Yoshua Bengio,et al.  Estimating or Propagating Gradients Through Stochastic Neurons for Conditional Computation , 2013, ArXiv.

[18]  Timothy Dozat,et al.  Deep Biaffine Attention for Neural Dependency Parsing , 2016, ICLR.

[19]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[20]  Timothy Baldwin,et al.  Towards Robust and Privacy-preserving Text Representations , 2018, ACL.

[21]  Michael Naehrig,et al.  Improved Security for a Ring-Based Fully Homomorphic Encryption Scheme , 2013, IMACC.

[22]  Ronald L. Rivest,et al.  ON DATA BANKS AND PRIVACY HOMOMORPHISMS , 1978 .

[23]  Alexander M. Rush,et al.  Character-Aware Neural Language Models , 2015, AAAI.

[24]  Pascal Paillier,et al.  Fast Homomorphic Evaluation of Deep Discretized Neural Networks , 2018, IACR Cryptol. ePrint Arch..

[25]  Martha Palmer,et al.  From TreeBank to PropBank , 2002, LREC.

[26]  Cynthia Dwork,et al.  Differential Privacy: A Survey of Results , 2008, TAMC.

[27]  Frederik Vercauteren,et al.  Somewhat Practical Fully Homomorphic Encryption , 2012, IACR Cryptol. ePrint Arch..

[28]  Craig Gentry,et al.  (Leveled) fully homomorphic encryption without bootstrapping , 2012, ITCS '12.

[29]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[30]  GentryCraig,et al.  Leveled) Fully Homomorphic Encryption without Bootstrapping , 2014 .

[31]  Luke S. Zettlemoyer,et al.  AllenNLP: A Deep Semantic Natural Language Processing Platform , 2018, ArXiv.

[32]  Michael Naehrig,et al.  CryptoNets: applying neural networks to encrypted data with high throughput and accuracy , 2016, ICML 2016.

[33]  Anat Paskin-Cherniavsky,et al.  Evaluating Branching Programs on Encrypted Data , 2007, TCC.

[34]  Craig Gentry,et al.  Computing arbitrary functions of encrypted data , 2010, CACM.