Crowdsourcing Annotations for Websites' Privacy Policies: Can It Really Work?

Website privacy policies are often long and difficult to understand. While research shows that Internet users care about their privacy, they do not have time to understand the policies of every website they visit, and most users hardly ever read privacy policies. Several recent efforts aim to crowdsource the interpretation of privacy policies and use the resulting annotations to build more effective user interfaces that provide users with salient policy summaries. However, very little attention has been devoted to studying the accuracy and scalability of crowdsourced privacy policy annotations, the types of questions crowdworkers can effectively answer, and the ways in which their productivity can be enhanced. Prior research indicates that most Internet users often have great difficulty understanding privacy policies, suggesting limits to the effectiveness of crowdsourcing approaches. In this paper, we assess the viability of crowdsourcing privacy policy annotations. Our results suggest that, if carefully deployed, crowdsourcing can indeed result in the generation of non-trivial annotations and can also help identify elements of ambiguity in policies. We further introduce and evaluate a method to improve the annotation process by predicting and highlighting paragraphs relevant to specific data practices.

[1]  Aniket Kittur,et al.  CrowdForge: crowdsourcing complex work , 2011, UIST.

[2]  Gabriele Meiselwitz,et al.  Readability Assessment of Policies and Procedures of Social Networking Sites , 2013, HCI.

[3]  Noah A. Smith,et al.  Unsupervised Alignment of Privacy Policies using Hidden Markov Models , 2014, ACL.

[4]  Ftc Staff,et al.  Protecting Consumer Privacy in an Era of Rapid Change–A Proposed Framework for Businesses and Policymakers , 2011 .

[5]  Matteo Negri,et al.  Divide and Conquer: Crowdsourcing the Creation of Cross-Lingual Textual Entailment Corpora , 2011, EMNLP.

[6]  Aleecia M. McDonald,et al.  The Cost of Reading Privacy Policies , 2009 .

[7]  Lorrie Faith Cranor,et al.  Disagreeable Privacy Policies: Mismatches between Meaning and Users’ Understanding , 2014 .

[8]  Björn Stierand Terms of Service; Didn't Read , 2018 .

[9]  Aniket Kittur,et al.  Crowd synthesis: extracting categories and clusters from complex data , 2014, CSCW.

[10]  Lorrie Faith Cranor,et al.  The platform for privacy preferences , 1999, CACM.

[11]  Lorrie Faith Cranor,et al.  A Design Space for Effective Privacy Notices , 2015, SOUPS.

[12]  Benjamin B. Bederson,et al.  Human computation: a survey and taxonomy of a growing field , 2011, CHI.

[13]  Noah A. Smith,et al.  The Usable Privacy Policy Project : Combining Crowdsourcing , Machine Learning and Natural Language Processing to Semi-Automatically Answer Those Privacy Questions Users Care About , 2014 .

[14]  Lydia B. Chilton,et al.  Cascade: crowdsourcing taxonomy creation , 2013, CHI.

[15]  Noah A. Smith,et al.  A Step Towards Usable Privacy Policy: Automatic Alignment of Privacy Statements , 2014, COLING.

[16]  Steven M. Bellovin,et al.  Privee: An Architecture for Automatically Analyzing Web Privacy Policies , 2014, USENIX Security Symposium.

[17]  Benjamin Fabian,et al.  Readability of Privacy Policies of Healthcare Websites , 2015, Wirtschaftsinformatik.

[18]  Ryan A. Rossi,et al.  Automatically identifying relations in privacy policies , 2009, SIGDOC '09.

[19]  Travis D. Breaux,et al.  Scaling requirements extraction to the crowd: Experiments with privacy policies , 2014, 2014 IEEE 22nd International Requirements Engineering Conference (RE).

[20]  Adam N. Joinson,et al.  Privacy, Trust, and Self-Disclosure Online , 2010, Hum. Comput. Interact..

[21]  Colin Potts,et al.  Privacy policies as decision-making tools: an evaluation of online privacy notices , 2004, CHI.

[22]  Mark S. Ackerman,et al.  Privacy in e-commerce: examining user scenarios and privacy preferences , 1999, EC '99.

[23]  Yang Wang,et al.  What matters to users?: factors that affect users' willingness to share information with online advertisers , 2013, SOUPS.

[24]  Thomas B. Norton,et al.  Privacy Harms and the Effectiveness of the Notice and Choice Framework , 2014 .

[25]  Tom Rodden,et al.  Consent for all: revealing the hidden complexity of terms and conditions , 2013, CHI.

[26]  Jerry den Hartog,et al.  A machine learning solution to assess privacy policy completeness: (short paper) , 2012, WPES '12.

[27]  J. Reeve,et al.  Solutions to problematic polypharmacy: learning from the expertise of patients. , 2015, The British journal of general practice : the journal of the Royal College of General Practitioners.

[28]  Karl Aberer,et al.  An Evaluation of Aggregation Techniques in Crowdsourcing , 2013, WISE.