Surfacing Privacy Settings Using Semantic Matching

Online services utilize privacy settings to provide users with control over their data. However, these privacy settings are often hard to locate, causing the user to rely on provider-chosen default values. In this work, we train privacy-settings-centric encoders and leverage them to create an interface that allows users to search for privacy settings using free-form queries. In order to achieve this goal, we create a custom Semantic Similarity dataset, which consists of real user queries covering various privacy settings. We then use this dataset to fine-tune a state of the art encoder. Using this fine-tuned encoder, we perform semantic matching between the user queries and the privacy settings to retrieve the most relevant setting. Finally, we also use the encoder to generate embeddings of privacy settings from the top 100 websites and perform unsupervised clustering to learn about the online privacy settings types. We find that the most common type of privacy settings are ‘Personalization’ and ‘Notifications’, with coverage of 35.8% and 34.4%, respectively, in our dataset.

[1]  J. R. Landis,et al.  The measurement of observer agreement for categorical data. , 1977, Biometrics.

[2]  Krishna P. Gummadi,et al.  Analyzing facebook privacy settings: user expectations vs. reality , 2011, IMC '11.

[3]  Bin Liu,et al.  Automated Analysis of Privacy Requirements for Mobile Apps , 2016, NDSS.

[4]  Tao Xie,et al.  PolicyLint: Investigating Internal Privacy Policy Contradictions on Google Play , 2019, USENIX Security Symposium.

[5]  Norman M. Sadeh,et al.  PrivOnto: A semantic framework for the analysis of privacy policies , 2017 .

[6]  Nan Hua,et al.  Universal Sentence Encoder , 2018, ArXiv.

[7]  Norman M. Sadeh,et al.  Reconciling mobile app privacy and usability on smartphones: could user privacy profiles help? , 2014, WWW.

[8]  Jonathan Berant,et al.  MultiQA: An Empirical Investigation of Generalization and Transfer in Reading Comprehension , 2019, ACL.

[9]  Claudia Niederée,et al.  Analyzing and Predicting Privacy Settings in the Social Web , 2015, UMAP.

[10]  Sebastian Ruder,et al.  Universal Language Model Fine-tuning for Text Classification , 2018, ACL.

[11]  Norman M. Sadeh,et al.  Identifying the Provision of Choices in Privacy Policy Text , 2017, EMNLP.

[12]  Norman M. Sadeh,et al.  Modeling Users' Mobile App Privacy Preferences: Restoring Usability in a Sea of Permission Settings , 2014, SOUPS.

[13]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[14]  Hana Habib,et al.  "It's a scavenger hunt": Usability of Websites' Opt-Out and Data Deletion Choices , 2020, CHI.

[15]  Alessandro Acquisti,et al.  Follow My Recommendations: A Personalized Privacy Assistant for Mobile App Permissions , 2016, SOUPS.

[16]  Iryna Gurevych,et al.  Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks , 2019, EMNLP.

[17]  Norman M. Sadeh,et al.  MAPS: Scaling Privacy Compliance Analysis to a Million Apps , 2019, Proc. Priv. Enhancing Technol..

[18]  Omer Levy,et al.  RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.

[19]  Najmeh Mousavi Nejad Semantic Analysis of Contractual Agreements to Support End-User Interpretation , 2018, EKAW.

[20]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[21]  Dandan Xu,et al.  Demystifying Hidden Privacy Settings in Mobile Apps , 2019, 2019 IEEE Symposium on Security and Privacy (SP).

[22]  Hana Habib,et al.  Finding a Choice in a Haystack: Automatic Extraction of Opt-Out Statements from Privacy Policy Text , 2020, WWW.

[23]  Namita Nisal Increasing the Salience of Data Use Opt-outs Online , 2017 .

[24]  Yonatan Belinkov,et al.  Linguistic Knowledge and Transferability of Contextual Representations , 2019, NAACL.

[25]  Hal Daumé,et al.  Deep Unordered Composition Rivals Syntactic Methods for Text Classification , 2015, ACL.

[26]  Lorrie Faith Cranor,et al.  An Empirical Analysis of Data Deletion and Opt-Out Choices on 150 Websites , 2019, SOUPS @ USENIX Security Symposium.

[27]  Qiang Yang,et al.  A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[28]  J. A. Hartigan,et al.  A k-means clustering algorithm , 1979 .

[29]  Kang G. Shin,et al.  Polisis: Automated Analysis and Presentation of Privacy Policies Using Deep Learning , 2018, USENIX Security Symposium.