KnIGHT: Mapping Privacy Policies to GDPR

Although the use of apps and online services comes with accompanying privacy policies, a majority of end-users ignore them due to their length, complexity and unappealing presentation.pite the potential risks. In light of the, now enforced EU-wide, General Data Protection Regulation (GDPR) we present an automatic technique for mapping privacy policies excerpts to relevant GDPR articles so as to support average users in understanding their usage risks and rights as a data subject. KnIGHT (Know your rIGHTs), is a tool that finds candidate sentences in a privacy policy that are potentially related to specific articles in the GDPR. The approach employs semantic text matching in order to find the most appropriate GDPR paragraph, and to the best of our knowledge is one of the first automatic attempts of its kind applied to a company’s policy. Our evaluation shows that on average between 70–90% of the tool’s automatic mappings are at least partially correct, meaning that the tool can be used to significantly guide human comprehension. Following this result, in the future we will utilize domain-specific vocabularies to perform a deeper semantic analysis and improve the results further.

[1]  Declan O'Sullivan,et al.  GDPRtEXT - GDPR as a Linked Data Resource , 2018, ESWC.

[2]  Frederick Liu,et al.  The Creation and Analysis of a Website Privacy Policy Corpus , 2016, ACL.

[3]  Anne Oeldorf-Hirsch,et al.  The Biggest Lie on the Internet: Ignoring the Privacy Policies and Terms of Service Policies of Social Networking Services , 2020 .

[4]  Alessandro Acquisti,et al.  Privacy and rationality in individual decision making , 2005, IEEE Security & Privacy.

[5]  John Mylopoulos,et al.  GaiusT: supporting the extraction of rights and obligations for regulatory compliance , 2013, Requirements Engineering.

[6]  Kang G. Shin,et al.  Polisis: Automated Analysis and Presentation of Privacy Policies Using Deep Learning , 2018, USENIX Security Symposium.

[7]  Naomie Salim,et al.  Understanding Plagiarism Linguistic Patterns, Textual Features, and Detection Methods , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[8]  Colin Potts,et al.  Privacy practices of Internet users: Self-reports versus observed behavior , 2005, Int. J. Hum. Comput. Stud..

[9]  Lorrie Faith Cranor,et al.  User interfaces for privacy agents , 2006, TCHI.

[10]  J. Fleiss Measuring agreement between two judges on the presence or absence of a trait. , 1975, Biometrics.

[11]  Annie I. Antón,et al.  Towards Regulatory Compliance: Extracting Rights and Obligations to Align Requirements with Regulations , 2006, 14th IEEE International Requirements Engineering Conference (RE'06).

[12]  George Hripcsak,et al.  Technical Brief: Agreement, the F-Measure, and Reliability in Information Retrieval , 2005, J. Am. Medical Informatics Assoc..

[13]  Cesare Bartolini,et al.  Reconciling Data Protection Rights and Obligations: An Ontology of the Forthcoming EU Regulation , 2015 .

[14]  M. de Rijke,et al.  Siamese CBOW: Optimizing Word Embeddings for Sentence Representations , 2016, ACL.

[15]  Gary William Grewal,et al.  A Machine-Learning Based Approach for Measuring the Completeness of Online Privacy Policies , 2015, 2015 IEEE 14th International Conference on Machine Learning and Applications (ICMLA).

[16]  Jerry den Hartog,et al.  A machine learning solution to assess privacy policy completeness: (short paper) , 2012, WPES '12.

[17]  George Hripcsak,et al.  Measuring agreement in medical informatics reliability studies , 2002, J. Biomed. Informatics.

[18]  Jacob Cohen,et al.  Weighted kappa: Nominal scale agreement provision for scaled disagreement or partial credit. , 1968 .