RECIPE: Applying Open Domain Question Answering to Privacy Policies

We describe our experiences in using an open domain question answering model (Chen et al., 2017) to evaluate an out-of-domain QA task of assisting in analyzing privacy policies of companies. Specifically, Relevant CI Parameters Extractor (RECIPE) seeks to answer questions posed by the theory of contextual integrity (CI) regarding the information flows described in the privacy statements. These questions have a simple syntactic structure and the answers are factoids or descriptive in nature. The model achieved an F1 score of 72.33, but we noticed that combining the results of this model with a neural dependency parser based approach yields a significantly higher F1 score of 92.35 compared to manual annotations. This indicates that future work which in-corporates signals from parsing like NLP tasks more explicitly can generalize better on out-of-domain tasks.

[1]  Hyunki Kim,et al.  Open domain question answering using Wikipedia-based knowledge model , 2014, Inf. Process. Manag..

[2]  Guillaume Lample,et al.  Neural Architectures for Named Entity Recognition , 2016, NAACL.

[3]  Helen Nissenbaum,et al.  Privacy in Context - Technology, Policy, and the Integrity of Social Life , 2009 .

[4]  Travis D. Breaux,et al.  An Evaluation of Constituency-Based Hyponymy Extraction from Privacy Policies , 2017, 2017 IEEE 25th International Requirements Engineering Conference (RE).

[5]  Celine Latulipe,et al.  Contextual gaps: privacy issues on Facebook , 2009, Ethics and Information Technology.

[6]  Ali Farhadi,et al.  Bidirectional Attention Flow for Machine Comprehension , 2016, ICLR.

[7]  Michael Zimmer Privacy on Planet Google: Using the Theory of "Contextual Integrity" to Clarify the Privacy Threats of Google's Quest for the Perfect Search Engine , 2008 .

[8]  Jeffery von Ronne,et al.  Privacy promises that can be kept: a policy analysis method with application to the HIPAA privacy rule , 2013, SACMAT '13.

[9]  Jason Weston,et al.  Reading Wikipedia to Answer Open-Domain Questions , 2017, ACL.

[10]  Norman M. Sadeh,et al.  Identifying the Provision of Choices in Privacy Policy Text , 2017, EMNLP.

[11]  Eduard H. Hovy,et al.  End-to-end Sequence Labeling via Bi-directional LSTM-CNNs-CRF , 2016, ACL.

[12]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[13]  Jianwei Niu,et al.  Lexical Similarity of Information Type Hypernyms, Meronyms and Synonyms in Privacy Policies , 2016, AAAI Fall Symposia.

[14]  Noah A. Smith,et al.  Crowdsourcing Annotations for Websites' Privacy Policies: Can It Really Work? , 2016, WWW.

[15]  Kang G. Shin,et al.  Polisis: Automated Analysis and Presentation of Privacy Policies Using Deep Learning , 2018, USENIX Security Symposium.

[16]  Noah A. Smith,et al.  Automatic Categorization of Privacy Policies: A Pilot Study , 2012 .

[17]  Jian Zhang,et al.  SQuAD: 100,000+ Questions for Machine Comprehension of Text , 2016, EMNLP.

[18]  Kenton Lee,et al.  Learning Recurrent Span Representations for Extractive Question Answering , 2016, ArXiv.

[19]  Travis D. Breaux,et al.  Towards an information type lexicon for privacy policies , 2015, 2015 IEEE Eighth International Workshop on Requirements Engineering and Law (RELAW).

[20]  Lorrie Faith Cranor,et al.  Disagreeable Privacy Policies: Mismatches between Meaning and Users’ Understanding , 2014 .

[21]  Travis D. Breaux,et al.  Mining Privacy Goals from Privacy Policies Using Hybridized Task Recomposition , 2016, ACM Trans. Softw. Eng. Methodol..

[22]  Helen Nissenbaum,et al.  Measuring Privacy: An Empirical Test Using Context To Expose Confounding Variables , 2015 .

[23]  Frederick Liu,et al.  The Creation and Analysis of a Website Privacy Policy Corpus , 2016, ACL.

[24]  Helen Nissenbaum,et al.  Privacy and contextual integrity: framework and applications , 2006, 2006 IEEE Symposium on Security and Privacy (S&P'06).

[25]  Wei Zhang,et al.  R3: Reinforced Ranker-Reader for Open-Domain Question Answering , 2018, AAAI.

[26]  Norman M. Sadeh,et al.  Automatic Extraction of Opt-Out Choices from Privacy Policies , 2016, AAAI Fall Symposia.

[27]  Phil Blunsom,et al.  Teaching Machines to Read and Comprehend , 2015, NIPS.