Large-scale readability analysis of privacy policies

Online privacy policies notify users of a Website how their personal information is collected, processed and stored. Against the background of rising privacy concerns, privacy policies seem to represent an influential instrument for increasing customer trust and loyalty. However, in practice, consumers seem to actually read privacy policies only in rare cases, possibly reflecting the common assumption stating that policies are hard to comprehend. By designing and implementing an automated extraction and readability analysis toolset that embodies a diversity of established readability measures, we present the first large-scale study that provides current empirical evidence on the readability of nearly 50,000 privacy policies of popular English-speaking Websites. The results empirically confirm that on average, current privacy policies are still hard to read. Furthermore, this study presents new theoretical insights for readability research, in particular, to what extent practical readability measures are correlated. Specifically, it shows the redundancy of several well-established readability metrics such as SMOG, RIX, LIX, GFI, FKG, ARI, and FRES, thus easing future choice making processes and comparisons between readability studies, as well as calling for research towards a readability measures framework. Moreover, a more sophisticated privacy policy extractor and analyzer as well as a solid policy text corpus for further research are provided.

[1]  Gabriele Meiselwitz,et al.  Readability Assessment of Policies and Procedures of Social Networking Sites , 2013, HCI.

[2]  J. Turow,et al.  Open to Exploitation: America's Shoppers Online and Offline , 2005 .

[4]  Annie I. Antón,et al.  Financial privacy policies and the need for standardization , 2004, IEEE Security & Privacy Magazine.

[5]  Frederick Liu,et al.  The Creation and Analysis of a Website Privacy Policy Corpus , 2016, ACL.

[6]  Joseph Gray Jackson,et al.  Privacy and Freedom , 1968 .

[7]  Norman M. Sadeh,et al.  Automatic Extraction of Opt-Out Choices from Privacy Policies , 2016, AAAI Fall Symposia.

[8]  Colin Potts,et al.  Private Policies Examined: Fair Warning or Fair Game? , 2003 .

[9]  William H. DuBay Smart Language: Readers, Readability, and the Grading of Text , 2007 .

[10]  J. Miller,et al.  Evaluating the Readability of Privacy Policies in Mobile Environments , 2011, Int. J. Mob. Hum. Comput. Interact..

[11]  Barry Werth,et al.  How short is too short? , 1991, The New York times magazine.

[12]  Lorrie Faith Cranor,et al.  A comparative study of online privacy policies and formats , 2009, Privacy Enhancing Technologies.

[13]  Alessandro Acquisti,et al.  The Effect of Online Privacy Information on Purchasing Behavior: An Experimental Study , 2011, WEIS.

[14]  Heng Xu,et al.  Information Privacy Research: An Interdisciplinary Review , 2011, MIS Q..

[15]  Lorrie Faith Cranor,et al.  How Short Is Too Short? Implications of Length and Framing on the Effectiveness of Privacy Notices , 2016, SOUPS.

[16]  Mike Thelwall,et al.  Web crawling ethics revisited: Cost, privacy, and denial of service , 2006, J. Assoc. Inf. Sci. Technol..

[17]  Irene Pollach,et al.  What's wrong with online privacy policies? , 2007, CACM.

[18]  P. Fitzsimmons,et al.  A readability assessment of online Parkinson's disease information. , 2010, The journal of the Royal College of Physicians of Edinburgh.

[19]  Dane K. Peterson,et al.  Consumer trust: privacy policies and third‐party seals , 2007 .

[20]  David Gefen,et al.  The Moderating Influence of Privacy Concern on the Efficacy of Privacy Assurance Mechanisms for Building Trust: A Multiple-Context Investigation , 2008, ICIS.

[21]  James Miller,et al.  A user-centric evaluation of the readability of privacy policies in popular web sites , 2011, Inf. Syst. Frontiers.

[22]  Romain Laborde,et al.  KAPUER: A Decision Support System for Privacy Policies Specification , 2014 .

[23]  M. Graber,et al.  Reading level of privacy policies on Internet health Web sites. , 2002, The Journal of family practice.

[24]  Tamara Dinev,et al.  Internet Privacy Concerns and Social Awareness as Determinants of Intention to Transact , 2005, Int. J. Electron. Commer..

[25]  F. Sultan,et al.  Are the Drivers and Role of Online Trust the Same for All Web Sites and Consumers?: A Large-Scale Exploratory Empirical Study , 2005 .

[26]  Heng Xu,et al.  Information privacy and correlates: an empirical attempt to bridge and distinguish privacy-related concepts , 2013, Eur. J. Inf. Syst..

[27]  Sushain K. Cherivirala Visualization and Interactive Exploration of Data Practices in Privacy Policies , 2016 .

[28]  Maryam Hazman A SURVEY OF FOCUSED CRAWLER APPROACHES , 2012 .

[29]  Lorrie Faith Cranor,et al.  A Design Space for Effective Privacy Notices , 2015, SOUPS.

[30]  Nathaniel Good,et al.  Empirical Studies on Software Notices to Inform Policy Makers and Usability Designers , 2007, Financial Cryptography.

[31]  Lorrie Faith Cranor,et al.  Designing Effective Privacy Notices and Controls , 2017, IEEE Internet Computing.

[32]  Alessandro Acquisti,et al.  Imagined Communities: Awareness, Information Sharing, and Privacy on the Facebook , 2006, Privacy Enhancing Technologies.

[33]  P. Ley,et al.  The use of readability formulas in health care , 1996 .

[34]  David Gefen,et al.  Efficacy of Privacy Assurance Mechanisms in the Context of Disclosing Health Information Online , 2008, AMCIS.

[35]  Frederick Liu,et al.  Analyzing Vocabulary Intersections of Expert Annotations and Topic Models for Data Practices in Privacy Policies , 2016, AAAI Fall Symposia.

[36]  Ali Sunyaev,et al.  Availability and quality of mobile health app privacy policies , 2015, J. Am. Medical Informatics Assoc..

[37]  Musa J Jafar,et al.  Exploratory Analysis Of The Readability Of Information Privacy Statement Of The Primary Social Networks , 2011 .

[38]  V. Dickson-Swift,et al.  Undertaking sensitive research in the health and social sciences , 2008 .

[39]  Hock-Hai Teo,et al.  The Value of Privacy Assurance: An Exploratory Field Experiment , 2007, MIS Q..

[40]  Amy S. Hedman,et al.  Using the SMOG Formula to Revise a Health-Related Document , 2008 .

[41]  Elisa Bertino,et al.  A roadmap for comprehensive online privacy policy management , 2007, CACM.

[42]  Rathindra Sarathy,et al.  The role of affect and cognition on online consumers' decision to disclose personal information to unfamiliar online vendors , 2011, Decis. Support Syst..

[43]  Eduard Hovy,et al.  Demystifying Privacy Policies with Language Technologies : Progress and Challenges , 2016 .

[44]  Rochelle A. Cadogan An Imbalance Of Power: The Readability Of Internet Privacy Policies , 2011 .

[45]  Colin Potts,et al.  Privacy practices of Internet users: Self-reports versus observed behavior , 2005, Int. J. Hum. Comput. Stud..

[46]  Lorrie Faith Cranor,et al.  A "nutrition label" for privacy , 2009, SOUPS.

[47]  George R. Klare,et al.  The measurement of readability , 1963 .

[48]  Matthew W. Vail,et al.  An analysis of web site privacy policy evolution in the presence of HIPAA , 2004 .

[49]  Herbert Burkert,et al.  Some Preliminary Comments on the DIRECTIVE 95/46/EC OF THE EUROPEAN PARLIAMENT AND OF THE COUNCIL of 24 October 1995 on the protection of individuals with regard to the processing of personal data and on the free movement of such data. , 1996 .

[50]  Lorrie Faith Cranor,et al.  Disagreeable Privacy Policies: Mismatches between Meaning and Users’ Understanding , 2014 .

[51]  Travis D. Breaux,et al.  Mining Privacy Goals from Privacy Policies Using Hybridized Task Recomposition , 2016, ACM Trans. Softw. Eng. Methodol..

[52]  Benjamin Fabian,et al.  Readability of Privacy Policies of Healthcare Websites , 2015, Wirtschaftsinformatik.

[53]  A. Policy Review of the 2002 Department of Health and Human Service Notice of Proposed Rule Making for The Health Insurance Portability and Accountability Act of 1996 (HIPAA) Privacy Regulations , 2002 .

[54]  J. Reeve,et al.  Solutions to problematic polypharmacy: learning from the expertise of patients. , 2015, The British journal of general practice : the journal of the Royal College of General Practitioners.

[55]  Glen L. Urban,et al.  Determinants and Role of Trust in E-Business: A Large Scale Empirical Study , 2003 .

[56]  Theodore L. Harris,et al.  The Literacy Dictionary: The Vocabulary of Reading and Writing , 1995 .

[57]  Aleecia M. McDonald,et al.  The Cost of Reading Privacy Policies , 2009 .

[58]  Colin Potts,et al.  Privacy policies as decision-making tools: an evaluation of online privacy notices , 2004, CHI.

[59]  Robert W. Proctor,et al.  Examining Usability of Web Privacy Policies , 2008, Int. J. Hum. Comput. Interact..

[60]  Kimberly M. Kelly,et al.  Tools for Assessing Readability and Quality of Health-Related Web Sites , 2009, Journal of Genetic Counseling.

[61]  Mary J. Culnan,et al.  Strategies for reducing online privacy risks: Why consumers read (or don't read) online privacy notices , 2004 .

[62]  Alessandro Acquisti,et al.  Privacy and rationality in individual decision making , 2005, IEEE Security & Privacy.

[63]  Serge Gutwirth,et al.  Privacy and the Information Age , 2001 .

[64]  Lorrie Faith Cranor,et al.  Timing is everything?: the effects of timing and placement of online privacy indicators , 2009, CHI.

[65]  Benjamin Fabian,et al.  Privacy Policies and Users' Trust: Does Readability Matter? , 2014, AMCIS.

[66]  Lorrie Faith Cranor,et al.  A user study of the expandable grid applied to P3P privacy policy visualization , 2008, WPES '08.

[67]  Benjamin Fabian,et al.  Exploring the Impact of Readability of Privacy Policies on Users' Trust , 2016, ECIS.

[68]  Jonathan Anderson Lix and Rix: Variations on a Little-Known Readability Index. , 1983 .