Challenges in Automated Question Answering for Privacy Policies

Privacy policies are legal documents used to inform users about the collection and handling of their data services or technologies with which they interact. Research has shown that few users take the time to read these policies, as they are often long and difficult to understand. In addition, users often only care about a small subset of issues discussed in privacy policies, and some of the issues they actually care about may not even be addressed in the text of the policies. Rather than requiring users to read the policies, a better approach might be to allow them to simply ask questions about those issues they care about, possibly through iterative dialog. In this work, we take a step towards this goal by exploring the idea of an automated privacy question-answering assistant, and look at the kinds of questions users are likely to pose to such a system. This analysis is informed by an initial study that elicits privacy questions from crowdworkers about the data practices of mobile apps. We analyze 1350 questions posed by crowdworkers about the privacy practices of a diverse cross section of mobile applications. This analysis sheds some light on privacy issues mobile app users are likely to inquire about as well as their ability to articulate questions in this domain. Our findings in turn should help inform the design of future privacy question answering systems.

[1]  Noah A. Smith,et al.  Analyzing Privacy Policies at Scale , 2018, ACM Trans. Web.

[2]  Eunsol Choi,et al.  QuAC: Question Answering in Context , 2018, EMNLP.

[3]  Norman M. Sadeh,et al.  Which Apps Have Privacy Policies? - An Analysis of Over One Million Google Play Store Apps , 2018, APF.

[4]  Percy Liang,et al.  Know What You Don’t Know: Unanswerable Questions for SQuAD , 2018, ACL.

[5]  Kang G. Shin,et al.  Polisis: Automated Analysis and Presentation of Privacy Policies Using Deep Learning , 2018, USENIX Security Symposium.

[6]  Norman M. Sadeh,et al.  PrivOnto: A semantic framework for the analysis of privacy policies , 2017 .

[7]  Norman Sadeh,et al.  Helping Users Understand Privacy Notices with Automated Query Answering Functionality : An Exploratory Study , 2018 .

[8]  Kamlesh Patil,et al.  Big data privacy: a technological perspective and review , 2017 .

[9]  Norman M. Sadeh,et al.  Identifying the Provision of Choices in Privacy Policy Text , 2017, EMNLP.

[10]  Eunsol Choi,et al.  TriviaQA: A Large Scale Distantly Supervised Challenge Dataset for Reading Comprehension , 2017, ACL.

[11]  Minh-Tien Nguyen,et al.  Legal Question Answering using Ranking SVM and Deep Convolutional Neural Network , 2017, ArXiv.

[12]  Philip Bachman,et al.  NewsQA: A Machine Comprehension Dataset , 2016, Rep4NLP@ACL.

[13]  David A. McAllester,et al.  Who did What: A Large-Scale Person-Centered Cloze Dataset , 2016, EMNLP.

[14]  Bowen Zhou,et al.  Improved Representation Learning for Question Answer Matching , 2016, ACL.

[15]  Jian Zhang,et al.  SQuAD: 100,000+ Questions for Machine Comprehension of Text , 2016, EMNLP.

[16]  Frederick Liu,et al.  The Creation and Analysis of a Website Privacy Policy Corpus , 2016, ACL.

[17]  Frederick Liu,et al.  Analyzing Vocabulary Intersections of Expert Annotations and Topic Models for Data Practices in Privacy Policies , 2016, AAAI Fall Symposia.

[18]  Lorrie Faith Cranor,et al.  How Short Is Too Short? Implications of Length and Framing on the Effectiveness of Privacy Notices , 2016, SOUPS.

[19]  Mi-Young Kim,et al.  Applying a Convolutional Neural Network to Legal Question Answering , 2015, JSAI-isAI Workshops.

[20]  Yi Yang,et al.  WikiQA: A Challenge Dataset for Open-Domain Question Answering , 2015, EMNLP.

[21]  Bowen Zhou,et al.  Applying deep learning to answer selection: A study and an open task , 2015, 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU).

[22]  Lorrie Faith Cranor,et al.  A Design Space for Effective Privacy Notices , 2015, SOUPS.

[23]  Phil Blunsom,et al.  Teaching Machines to Read and Comprehend , 2015, NIPS.

[24]  Yi-Hung Liu,et al.  Predicting associated statutes for legal problems , 2015, Inf. Process. Manag..

[25]  Lorrie Faith Cranor,et al.  Disagreeable Privacy Policies: Mismatches between Meaning and Users’ Understanding , 2014 .

[26]  Thomas B. Norton,et al.  Privacy Harms and the Effectiveness of the Notice and Choice Framework , 2014 .

[27]  Noah A. Smith,et al.  The Usable Privacy Policy Project : Combining Crowdsourcing , Machine Learning and Natural Language Processing to Semi-Automatically Answer Those Privacy Questions Users Care About , 2014 .

[28]  Lorrie Faith Cranor,et al.  Necessary But Not Sufficient: Standardized Mechanisms for Privacy Notice and Choice , 2012, J. Telecommun. High Technol. Law.

[29]  V. Steeves,et al.  Fixing Broken Doors: Strategies for Drafting Privacy Policies Young People Can Understand , 2010 .

[30]  Fred H. Cate,et al.  The Limits of Notice and Choice , 2010, IEEE Security & Privacy.

[31]  Lorrie Faith Cranor,et al.  A "nutrition label" for privacy , 2009, SOUPS.

[32]  Alexander F. Gelbukh,et al.  NLP for Shallow Question Answering of Legal Documents Using Graphs , 2009, CICLing.

[33]  Aleecia M. McDonald,et al.  The Cost of Reading Privacy Policies , 2009 .

[34]  Berthold Crysmann,et al.  Question answering from structured knowledge sources , 2007, J. Appl. Log..

[35]  Diego Mollá Aliod,et al.  Question Answering in Restricted Domains: An Overview , 2007, CL.

[36]  Paulo Quaresma,et al.  A Question Answer System for Legal Information Retrieval , 2005, JURIX.

[37]  Lorrie Faith Cranor,et al.  P3P: Making Privacy Policies More Useful , 2003, IEEE Secur. Priv..