The Privacy Policy Landscape After the GDPR

Abstract The EU General Data Protection Regulation (GDPR) is one of the most demanding and comprehensive privacy regulations of all time. A year after it went into effect, we study its impact on the landscape of privacy policies online. We conduct the first longitudinal, in-depth, and at-scale assessment of privacy policies before and after the GDPR. We gauge the complete consumption cycle of these policies, from the first user impressions until the compliance assessment. We create a diverse corpus of two sets of 6,278 unique English-language privacy policies from inside and outside the EU, covering their pre-GDPR and the post-GDPR versions. The results of our tests and analyses suggest that the GDPR has been a catalyst for a major overhaul of the privacy policies inside and outside the EU. This overhaul of the policies, manifesting in extensive textual changes, especially for the EU-based websites, comes at mixed benefits to the users. While the privacy policies have become considerably longer, our user study with 470 participants on Amazon MTurk indicates a significant improvement in the visual representation of privacy policies from the users’ perspective for the EU websites. We further develop a new workflow for the automated assessment of requirements in privacy policies. Using this workflow, we show that privacy policies cover more data practices and are more consistent with seven compliance requirements post the GDPR. We also assess how transparent the organizations are with their privacy practices by performing specificity analysis. In this analysis, we find evidence for positive changes triggered by the GDPR, with the specificity level improving on average. Still, we find the landscape of privacy policies to be in a transitional phase; many policies still do not meet several key GDPR requirements or their improved coverage comes with reduced specificity.

[1]  Neha Jain,et al.  HIPAA's Effect on Web Site Privacy Policies , 2007, IEEE Security & Privacy.

[2]  M. Napierala What Is the Bonferroni Correction ? , 2014 .

[3]  Annie I. Antón,et al.  Analyzing Website privacy requirements using a privacy goal taxonomy , 2002, Proceedings IEEE Joint International Conference on Requirements Engineering.

[4]  Gitte Lindgaard,et al.  Attention web designers: You have 50 milliseconds to make a good first impression! , 2006, Behav. Inf. Technol..

[5]  Timothy Baldwin,et al.  langid.py: An Off-the-shelf Language Identification Tool , 2012, ACL.

[6]  Yoon Kim,et al.  Convolutional Neural Networks for Sentence Classification , 2014, EMNLP.

[7]  Paolo Torroni,et al.  CLAUDETTE meets GDPR: Automating the Evaluation of Privacy Policies using Artificial Intelligence , 2018 .

[8]  Norman M. Sadeh,et al.  Identifying the Provision of Choices in Privacy Policy Text , 2017, EMNLP.

[9]  Sultan Idris,et al.  Readability : The limitations of an approach through formulae , 2012 .

[10]  Annie I. Antón,et al.  An Empirical Study of Consumer Perceptions and Comprehension of Web Site Privacy Policies , 2008, IEEE Transactions on Engineering Management.

[11]  Kang G. Shin,et al.  Polisis: Automated Analysis and Presentation of Privacy Policies Using Deep Learning , 2018, USENIX Security Symposium.

[12]  Frank Keller,et al.  Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, EMNLP 2014, October 25-29, 2014, Doha, Qatar, A meeting of SIGDAT, a Special Interest Group of the ACL , 2014, EMNLP.

[13]  Nora A Draper,et al.  Persistent Misperceptions: Americans’ Misplaced Confidence in Privacy Policies, 2003–2015 , 2018, Journal of Broadcasting & Electronic Media.

[14]  Katharina Reinecke,et al.  Predicting users' first impressions of website aesthetics with a quantification of perceived visual complexity and colorfulness , 2013, CHI.

[15]  Fei Liu,et al.  Automatic Detection of Vague Words and Sentences in Privacy Policies , 2018, EMNLP.

[16]  Florencia Marotta-Wurgler Self-Regulation and Competition in Privacy Policies , 2016, The Journal of Legal Studies.

[17]  Kang G. Shin,et al.  PriBots: Conversational Privacy with Chatbots , 2016, WSF@SOUPS.

[18]  Peter Fankhauser,et al.  Boilerplate detection using shallow text features , 2010, WSDM '10.

[19]  Thorsten Holz,et al.  We Value Your Privacy ... Now Take Some Cookies: Measuring the GDPR's Impact on Web Privacy , 2019, NDSS.

[20]  Paolo Torroni,et al.  CLAUDETTE: an automated detector of potentially unfair clauses in online terms of service , 2018, Artificial Intelligence and Law.

[21]  Annie I. Antón,et al.  Financial privacy policies and the need for standardization , 2004, IEEE Security & Privacy Magazine.

[22]  Frederick Liu,et al.  The Creation and Analysis of a Website Privacy Policy Corpus , 2016, ACL.

[23]  Martin Degeling,et al.  We Value Your Privacy ... Now Take Some Cookies , 2018, Informatik Spektrum.

[24]  J. P. Kincaid,et al.  The Computer Readability Editing System , 1981, IEEE Transactions on Professional Communication.

[25]  Mary J. Culnan,et al.  Using the Content of Online Privacy Notices to Inform Public Policy: A Longitudinal Analysis of the 1998-2001 U.S. Web Surveys , 2002, Inf. Soc..

[26]  Noah A. Smith,et al.  Unsupervised Alignment of Privacy Policies using Hidden Markov Models , 2014, ACL.

[27]  Shinsaku Kiyomoto,et al.  PrivacyGuide: Towards an Implementation of the EU GDPR on Internet Privacy Policy Evaluation , 2018, IWSPA@CODASPY.

[28]  Herbert Burkert,et al.  Some Preliminary Comments on the DIRECTIVE 95/46/EC OF THE EUROPEAN PARLIAMENT AND OF THE COUNCIL of 24 October 1995 on the protection of individuals with regard to the processing of personal data and on the free movement of such data. , 1996 .

[29]  Chang Liu,et al.  Raising a Red Flag on Global WWW Privacy Policies , 2002, J. Comput. Inf. Syst..