Lexical Similarity of Information Type Hypernyms, Meronyms and Synonyms in Privacy Policies

Privacy policies are used to communicate company data practices to consumers and must be accurate and comprehensive. Each policy author is free to use their own nomenclature when describing data practices, which leads to different ways in which similar information types are described across policies. A formal ontology can help policy authors, users and regulators consistently check how data practice descriptions relate to other interpretations of information types. In this paper, we describe an empirical method for manually constructing an information type ontology from privacy policies. The method consists of seven heuristics that explain how to infer hypernym, meronym and synonym relationships from information type phrases, which we discovered using grounded analysis of five privacy policies. The method was evaluated on 50 mobile privacy policies which produced an ontology consisting of 355 unique information type names. Based on the manual results, we describe an automated technique consisting of 14 reusable semantic rules to extract hypernymy, meronymy, and synonymy relations from information type phrases. The technique was evaluated on the manually constructed ontology to yield .95 precision and .51 recall.

[1]  Travis D. Breaux,et al.  Scaling requirements extraction to the crowd: Experiments with privacy policies , 2014, 2014 IEEE 22nd International Requirements Engineering Conference (RE).

[2]  N. Hoffart Basics of Qualitative Research: Techniques and Procedures for Developing Grounded Theory , 2000 .

[3]  Travis D. Breaux,et al.  Automated Extraction of Regulated Information Types Using Hyponymy Relations , 2016, 2016 IEEE 24th International Requirements Engineering Conference Workshops (REW).

[4]  Marti A. Hearst Automatic Acquisition of Hyponyms from Large Text Corpora , 1992, COLING.

[5]  James H. Martin,et al.  Speech and language processing: an introduction to natural language processing, computational linguistics, and speech recognition, 2nd Edition , 2000, Prentice Hall series in artificial intelligence.

[6]  Travis D. Breaux,et al.  Legally "reasonable" security requirements: A 10-year FTC retrospective , 2011, Comput. Secur..

[7]  Frederick Liu,et al.  The Creation and Analysis of a Website Privacy Policy Corpus , 2016, ACL.

[8]  Travis D. Breaux,et al.  Towards an information type lexicon for privacy policies , 2015, 2015 IEEE Eighth International Workshop on Requirements Engineering and Law (RELAW).

[9]  Colin Potts,et al.  Naturalistic inquiry and requirements engineering: reconciling their theoretical foundations , 1997, Proceedings of ISRE '97: 3rd IEEE International Symposium on Requirements Engineering.

[10]  L. Postman,et al.  Short-term Temporal Changes in Free Recall , 1965 .

[11]  Johnny Saldaña,et al.  The Coding Manual for Qualitative Researchers , 2009 .

[12]  Anselm L. Strauss,et al.  Basics of qualitative research : techniques and procedures for developing grounded theory , 1998 .

[13]  Chu-Ren Huang,et al.  Ontology and the lexicon : a natural language processing perspective , 2010 .

[14]  Timothy W. Finin,et al.  Authorization and privacy for semantic Web services , 2004, IEEE Intelligent Systems.

[15]  Ashwini Rao,et al.  Eddy, a formal language for specifying and analyzing data flow specifications for conflicting privacy requirements , 2014, Requirements Engineering.

[16]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[17]  Michael Uschold,et al.  Ontologies: principles, methods and applications , 1996, The Knowledge Engineering Review.

[18]  Tharam S. Dillon,et al.  Privacy Ontology Support for E-Commerce , 2008, IEEE Internet Computing.

[19]  Daniel Jurafsky,et al.  Learning Syntactic Patterns for Automatic Hypernym Discovery , 2004, NIPS.

[20]  Travis D. Breaux,et al.  Detecting repurposing and over-collection in multi-party privacy requirements specifications , 2015, 2015 IEEE 23rd International Requirements Engineering Conference (RE).

[21]  Brett Benyo,et al.  Representation and reasoning for DAML-based policy and domain services in KAoS and nomads , 2003, AAMAS '03.

[22]  J. Fleiss Measuring nominal scale agreement among many raters. , 1971 .

[23]  Ian Horrocks,et al.  Practical Reasoning for Expressive Description Logics , 1999, LPAR.

[24]  Annie I. Antón,et al.  A requirements taxonomy for reducing Web site privacy vulnerabilities , 2004, Requirements Engineering.

[25]  Ram Krishnan,et al.  Toward a Framework for Detecting Privacy Policy Violations in Android Application Code , 2016, 2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE).