Privee: An Architecture for Automatically Analyzing Web Privacy Policies

Privacy policies on websites are based on the notice-and-choice principle. They notify Web users of their privacy choices. However, many users do not read privacy policies or have difficulties understanding them. In order to increase privacy transparency we propose Privee--a software architecture for analyzing essential policy terms based on crowdsourcing and automatic classification techniques. We implement Privee in a proof of concept browser extension that retrieves policy analysis results from an online privacy policy repository or, if no such results are available, performs automatic classifications. While our classifiers achieve an overall F-1 score of 90%, our experimental results suggest that classifier performance is inherently limited as it correlates to the same variable to which human interpretations correlate--the ambiguity of natural language. This finding might be interpreted to call the notice-and-choice principle into question altogether. However, as our results further suggest that policy ambiguity decreases over time, we believe that the principle is workable. Consequently, we see Privee as a promising avenue for facilitating the notice-and-choice principle by accurately notifying-Web users of privacy practices and increasing privacy transparency on the Web.

[1]  Robert W. Reeder,et al.  Expandable grids: a user interface visualization technique and a policy semantics to support fast, accurate security and privacy policy authoring , 2008 .

[2]  Lorrie Faith Cranor,et al.  User interfaces for privacy agents , 2006, TCHI.

[3]  Claudia Soria,et al.  Automatic semantics extraction in law documents , 2005, ICAIL '05.

[4]  Sabine Bergler,et al.  Mining WordNet for a Fuzzy Sentiment: Sentiment Tag Extraction from WordNet Glosses , 2006, EACL.

[5]  Ryan A. Rossi,et al.  Automatically identifying relations in privacy policies , 2009, SIGDOC '09.

[6]  Clare-Marie Karat,et al.  Usable security and privacy: a case study of developing privacy management tools , 2005, SOUPS '05.

[7]  Lorrie Faith Cranor,et al.  Searching for Privacy: Design and Implementation of a P3P-Enabled Search Engine , 2004, Privacy Enhancing Technologies.

[8]  Patrick Gage Kelley Designing a privacy label: assisting consumer understanding of online privacy practices , 2009, CHI Extended Abstracts.

[9]  Radboud Winkels,et al.  A next step towards automated modelling of sources of law , 2009, ICAIL.

[10]  Din J. Wasem,et al.  Mining of Massive Datasets , 2014 .

[11]  Ramakrishnan Srikant,et al.  An XPath-based preference language for P3P , 2003, WWW '03.

[12]  Ron Artstein,et al.  Survey Article: Inter-Coder Agreement for Computational Linguistics , 2008, CL.

[13]  Lorrie Faith Cranor,et al.  Standardizing privacy notices: an online study of the nutrition label approach , 2010, CHI.

[14]  Annie I. Antón,et al.  Analyzing Regulatory Rules for Privacy and Security Requirements , 2008, IEEE Transactions on Software Engineering.

[15]  Radboud Winkels,et al.  Machine Learning versus Knowledge Based Classification of Legal Texts , 2010, JURIX.

[16]  Lorrie Faith Cranor,et al.  A comparative study of online privacy policies and formats , 2009, Privacy Enhancing Technologies.

[17]  Lorrie Faith Cranor,et al.  Privacy as part of the app decision-making process , 2013, CHI.

[18]  I. Rubinstein Privacy and Regulatory Innovation: Moving Beyond Voluntary Codes , 2010 .

[19]  Lorrie Faith Cranor,et al.  Token attempt: the misrepresentation of website privacy policies through the misuse of p3p compact policy tokens , 2010, WPES '10.

[20]  Lorrie Faith Cranor,et al.  Necessary But Not Sufficient: Standardized Mechanisms for Privacy Notice and Choice , 2012, J. Telecommun. High Technol. Law.

[21]  Jerry den Hartog,et al.  A machine learning solution to assess privacy policy completeness: (short paper) , 2012, WPES '12.

[22]  E. N. Westerhout,et al.  Definition Extraction using Linguistic and Structural Features , 2009 .

[23]  Joel R. Reidenberg,et al.  The Use of Technology to Assure Internet Privacy : Adapting Labels and Filters for Data Protection , 1997 .

[24]  Florencia Marotta-Wurgler,et al.  Does Contract Disclosure Matter? , 2012 .

[25]  Michael Waidner,et al.  Platform for Enterprise Privacy Practices: Privacy-Enabled Management of Customer Data , 2002, Privacy Enhancing Technologies.

[26]  T. Rogers,et al.  Semantic diversity: A measure of semantic ambiguity based on variability in the contextual usage of words , 2012, Behavior Research Methods.

[27]  Harald Zwingelberg,et al.  UI prototypes : Policy administration and presentation (version 1) , 2009 .

[28]  Radboud Winkels,et al.  Automatic Classification of Sentences in Dutch Laws , 2008, JURIX.

[29]  Grigorios Tsoumakas,et al.  Multi-Label Classification: An Overview , 2007, Int. J. Data Warehous. Min..

[30]  Lorrie Faith Cranor,et al.  A user study of the expandable grid applied to P3P privacy policy visualization , 2008, WPES '08.

[31]  Barbara Di Eugenio,et al.  Squibs and Discussions: The Kappa Statistic: A Second Look , 2004, CL.

[32]  Eline Westerhout,et al.  Extraction of Definitions Using Grammar-Enhanced Machine Learning , 2009, EACL.

[33]  Anand Rajaraman,et al.  Mining of Massive Datasets , 2011 .

[34]  Clare-Marie Karat,et al.  An empirical study of natural language parsing of privacy policy rules using the SPARCLE policy workbench , 2006, SOUPS '06.

[35]  Bob Carpenter,et al.  The Benefits of a Model of Annotation , 2013, Transactions of the Association for Computational Linguistics.

[36]  Sunita Sarawagi,et al.  Discriminative Methods for Multi-labeled Classification , 2004, PAKDD.

[37]  Esma Aïmeur,et al.  UPP: User Privacy Policy for Social Networking Sites , 2009, 2009 Fourth International Conference on Internet and Web Applications and Services.

[38]  Annie I. Antón,et al.  Mining rule semantics to understand legislative compliance , 2005, WPES '05.

[39]  Rebecca J. Passonneau,et al.  Measuring Agreement on Set-valued Items (MASI) for Semantic and Pragmatic Annotation , 2006, LREC.

[40]  J. Fleiss Measuring nominal scale agreement among many raters. , 1971 .

[41]  Derek Greene,et al.  Merging multiple criteria to identify suspicious reviews , 2010, RecSys '10.

[42]  Manfred Stede,et al.  Identifying the Content Zones of German Court Decisions , 2009, BIS.

[43]  George Hripcsak,et al.  Technical Brief: Agreement, the F-Measure, and Reliability in Information Retrieval , 2005, J. Am. Medical Informatics Assoc..

[44]  Marc Langheinrich,et al.  The platform for privacy preferences 1.0 (p3p1.0) specification , 2002 .

[45]  Lorrie Faith Cranor,et al.  A "nutrition label" for privacy , 2009, SOUPS.

[46]  Joel R. Reidenberg,et al.  Can User Agents Accurately Represent Privacy Policies , 2002 .

[47]  Jerry den Hartog,et al.  What Websites Know About You , 2012, DPM/SETOP.

[48]  Marit Hansen,et al.  Towards Displaying Privacy Information with Icons , 2010, PrimeLife.

[49]  Corey A Ciocchetti The Future of Privacy Policies: A Privacy Nutrition Label Filled with Fair Information Practices , 2009 .

[50]  Marie-Francine Moens,et al.  Automatic detection of arguments in legal texts , 2007, ICAIL.

[51]  Jean Carletta,et al.  Squibs: Reliability Measurement without Limits , 2008, CL.

[52]  Martin F. Porter,et al.  An algorithm for suffix stripping , 1997, Program.

[53]  Noah A. Smith,et al.  Automatic Categorization of Privacy Policies: A Pilot Study , 2012 .

[54]  Armando Solar-Lezama,et al.  A language for automatically enforcing privacy policies , 2012, POPL '12.

[55]  Christopher D. Manning,et al.  Introduction to Information Retrieval , 2010, J. Assoc. Inf. Sci. Technol..

[56]  Adam Przepiórkowski,et al.  Definition Extraction with Balanced Random Forests , 2008, GoTAL.

[57]  Andrea Passerini,et al.  Automatic Classification of Provisions in Legislative Texts , 2007, Artificial Intelligence and Law.

[58]  Lorrie Faith Cranor,et al.  The platform for privacy preferences , 1999, CACM.

[59]  Klaus Krippendorff,et al.  Content Analysis: An Introduction to Its Methodology , 1980 .

[60]  M. F. Porter,et al.  An algorithm for suffix stripping , 1997 .

[61]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[62]  Federal Trade Commission Protecting Consumer Privacy in an Era of Rapid Change - A Proposed Framework for Businesses and Policymakers (Preliminary FTC Staff Report) , 2011, J. Priv. Confidentiality.