Automatic Extraction of Opt-Out Choices from Privacy Policies

Online “notice and choice” is an essential concept in the US FTC’s Fair Information Practice Principles. Privacy laws based on these principles include requirements for providing notice about data practices and allowing individuals to exercise control over those practices. Internet users need control over privacy, but their options are hidden in long privacy policies which are cumbersome to read and understand. In this paper, we describe several approaches to automatically extract choice instances from privacy policy documents using natural language processing and machine learning techniques. We define a choice instance as a statement in a privacy policy that indicates the user has discretion over the collection, use, sharing, or retention of their data. We describe machine learning approaches for extracting instances containing opt-out hyperlinks and evaluate the proposed methods using the OPP-115 Corpus, a dataset of annotated privacy policies. Extracting information about privacy choices and controls enables the development of concise and usable interfaces to help Internet users better understand the choices offered by online services.

[1]  Akira Shimazu,et al.  Learning Logical Structures of Paragraphs in Legal Articles , 2011, IJCNLP.

[2]  Lorrie Faith Cranor,et al.  Necessary But Not Sufficient: Standardized Mechanisms for Privacy Notice and Choice , 2012, J. Telecommun. High Technol. Law.

[3]  Jerry den Hartog,et al.  A machine learning solution to assess privacy policy completeness: (short paper) , 2012, WPES '12.

[4]  L. Cranor,et al.  Are They Actually Any Different? Comparing Thousands of Financial Institutions’ Privacy Practices , 2013 .

[5]  Annie I. Antón,et al.  Analyzing Website privacy requirements using a privacy goal taxonomy , 2002, Proceedings IEEE Joint International Conference on Requirements Engineering.

[6]  Giulia Venturi,et al.  5. Semantic Processing Of Legal Texts , 2018 .

[7]  J. Reeve,et al.  Solutions to problematic polypharmacy: learning from the expertise of patients. , 2015, The British journal of general practice : the journal of the Royal College of General Practitioners.

[8]  Andrea Passerini,et al.  Automatic Classification of Provisions in Legislative Texts , 2007, Artificial Intelligence and Law.

[9]  Steven M. Bellovin,et al.  Privee: An Architecture for Automatically Analyzing Web Privacy Policies , 2014, USENIX Security Symposium.

[10]  Fred H. Cate,et al.  The Limits of Notice and Choice , 2010, IEEE Security & Privacy.

[11]  Marie-Francine Moens,et al.  Approaches to Text Mining Arguments from Legal Cases , 2010, Semantic Processing of Legal Texts.

[12]  Lorrie Faith Cranor,et al.  A Design Space for Effective Privacy Notices , 2015, SOUPS.

[13]  Claudia Soria,et al.  Automatic semantics extraction in law documents , 2005, ICAIL '05.

[14]  Aleecia M. McDonald,et al.  The Cost of Reading Privacy Policies , 2009 .

[15]  Frederick Liu,et al.  The Creation and Analysis of a Website Privacy Policy Corpus , 2016, ACL.

[16]  Paul Compton,et al.  Combining Different Summarization Techniques for Legal Text , 2012 .

[17]  Noah A. Smith,et al.  Unsupervised Alignment of Privacy Policies using Hidden Markov Models , 2014, ACL.