I Read but Don't Agree: Privacy Policy Benchmarking using Machine Learning and the EU GDPR

With the continuing growth of the Internet landscape, users share large amount of personal, sometimes, privacy sensitive data. When doing so, often, users have little or no clear knowledge about what service providers do with the trails of personal data they leave on the Internet. While regulations impose rather strict requirements that service providers should abide by, the defacto approach seems to be communicating data processing practices through privacy policies. However, privacy policies are long and complex for users to read and understand, thus failing their mere objective of informing users about the promised data processing behaviors of service providers. To address this pertinent issue, we propose a machine learning based approach to summarize the rather long privacy policy into short and condensed notes following a risk-based approach and using the European Union (EU) General Data Protection Regulation (GDPR) aspects as assessment criteria. The results are promising and indicate that our tool can summarize lengthy privacy policies in a short period of time, thus supporting users to take informed decisions regarding their information disclosure behaviors.

[1]  Welderufael B. Tesfay,et al.  Towards User-Centered Privacy Risk Detection and Quantification Framework , 2016, 2016 8th IFIP International Conference on New Technologies, Mobility and Security (NTMS).

[2]  Ali Sunyaev,et al.  Availability and quality of mobile health app privacy policies , 2015, J. Am. Medical Informatics Assoc..

[3]  Frederick Liu,et al.  The Creation and Analysis of a Website Privacy Policy Corpus , 2016, ACL.

[4]  J. Reeve,et al.  Solutions to problematic polypharmacy: learning from the expertise of patients. , 2015, The British journal of general practice : the journal of the Royal College of General Practitioners.

[5]  Aleecia M. McDonald,et al.  The Cost of Reading Privacy Policies , 2009 .

[6]  Lars Kotthoff,et al.  A Preliminary Evaluation of Machine Learning in Algorithm Selection for Search Problems , 2011, SOCS.

[7]  Lorrie Faith Cranor,et al.  Necessary But Not Sufficient: Standardized Mechanisms for Privacy Notice and Choice , 2012, J. Telecommun. High Technol. Law.

[8]  Rochelle A. Cadogan An Imbalance Of Power: The Readability Of Internet Privacy Policies , 2011 .

[9]  K. Suzanne Barber,et al.  PrivacyCheck , 2018, ACM Trans. Internet Techn..

[10]  Jerry den Hartog,et al.  A machine learning solution to assess privacy policy completeness: (short paper) , 2012, WPES '12.

[11]  Steven M. Bellovin,et al.  Privee: An Architecture for Automatically Analyzing Web Privacy Policies , 2014, USENIX Security Symposium.

[12]  Robert W. Proctor,et al.  Examining Usability of Web Privacy Policies , 2008, Int. J. Hum. Comput. Interact..

[13]  Shinsaku Kiyomoto,et al.  Easing the Burden of Setting Privacy Preferences: A Machine Learning Approach , 2016, ICISSP.