Privacy-Aware Personal Data Storage (P-PDS): Learning how to Protect User Privacy from External Applications

Recently, Personal Data Storage (PDS) has inaugurated a substantial change to the way people can store and control their personal data, by moving from a service-centric to a user-centric model. PDS offers individuals the capability to keep their data in a unique logical repository, that can be connected and exploited by proper analytical tools, or shared with third parties under the control of end users. Up to now, most of the research on PDS has focused on how to enforce user privacy preferences and how to secure data when stored into the PDS. In contrast, in this paper we aim at designing a Privacy-aware Personal Data Storage (P-PDS), that is, a PDS able to automatically take privacy-aware decisions on third parties access requests in accordance with user preferences. The proposed P-PDS is based on preliminary results presented in [1], where it has been demonstrated that semi-supervised learning can be successfully exploited to make a PDS able to automatically decide whether an access request has to be authorized or not. In this paper, we have deeply revised the learning process so as to have a more usable P-PDS, in terms of reduced effort for the training phase, as well as a more conservative approach w.r.t. users privacy, when handling conflicting access requests. We run several experiments on a realistic dataset exploiting a group of 360 evaluators. The obtained results show the effectiveness of the proposed approach.

[1]  Brian M. Sweatt,et al.  A privacy-preserving personal sensor data ecosystem , 2014 .

[2]  Frank Wang,et al.  Sieve: Cryptographically Enforced Access Control for User Data in Untrusted Clouds , 2016, NSDI.

[3]  Marko Hölbl,et al.  Privacy antecedents for SNS self-disclosure: The case of Facebook , 2015, Comput. Hum. Behav..

[4]  Burr Settles,et al.  Active Learning Literature Survey , 2009 .

[5]  Deborah L. McGuinness,et al.  Bringing Semantics to Web Services: The OWL-S Approach , 2004, SWSWPC.

[6]  Craig A. Knoblock,et al.  Active + Semi-supervised Learning = Robust Multi-View Learning , 2002, ICML.

[7]  George Danezis Inferring privacy policies for social networking services , 2009, AISec '09.

[8]  Yan Zhu,et al.  BC-PDS: Protecting Privacy and Self-Sovereignty through BlockChains for OpenPDS , 2017, 2017 IEEE Symposium on Service-Oriented System Engineering (SOSE).

[9]  Steven M. Bellovin,et al.  A study of privacy settings errors in an online social network , 2012, 2012 IEEE International Conference on Pervasive Computing and Communications Workshops.

[10]  HongJason,et al.  Understanding and capturing people's privacy policies in a mobile social networking application , 2009 .

[11]  Erez Shmueli,et al.  openPDS: Protecting the Privacy of Metadata through SafeAnswers , 2014, PloS one.

[12]  Barbara Carminati,et al.  Privacy Settings Recommender for Online Social Network , 2016, 2016 IEEE 2nd International Conference on Collaboration and Internet Computing (CIC).

[13]  Xiaojin Zhu,et al.  --1 CONTENTS , 2006 .

[14]  Kristen LeFevre,et al.  Privacy wizards for social networking sites , 2010, WWW '10.

[15]  Qi Li,et al.  Personal Data Management with the Databox: What's Inside the Box? , 2016, CAN@CoNEXT.

[16]  Alessandro Acquisti,et al.  Imagined Communities: Awareness, Information Sharing, and Privacy on the Facebook , 2006, Privacy Enhancing Technologies.

[17]  Barbara Carminati,et al.  A Risk-Benefit Driven Architecture for Personal Data Release (Invited Paper) , 2016, 2016 IEEE 17th International Conference on Information Reuse and Integration (IRI).

[18]  Yolande Belaïd,et al.  A Stream-Based Semi-supervised Active Learning Approach for Document Classification , 2013, 2013 12th International Conference on Document Analysis and Recognition.

[19]  David A. Cohn,et al.  Improving generalization with active learning , 1994, Machine Learning.

[20]  Robert Morris,et al.  Oort: User-Centric Cloud Storage with Global Queries , 2016 .

[21]  Sean Borman,et al.  The Expectation Maximization Algorithm A short tutorial , 2006 .

[22]  Krishna P. Gummadi,et al.  Analyzing facebook privacy settings: user expectations vs. reality , 2011, IMC '11.

[23]  Shinsaku Kiyomoto,et al.  Easing the Burden of Setting Privacy Preferences: A Machine Learning Approach , 2016, ICISSP.

[24]  Barbara Carminati,et al.  Learning Privacy Habits of PDS Owners , 2017, 2017 IEEE 37th International Conference on Distributed Computing Systems (ICDCS).

[25]  Geoff Holmes,et al.  Active Learning With Drifting Streaming Data , 2014, IEEE Transactions on Neural Networks and Learning Systems.

[26]  Alessandro Acquisti,et al.  Information revelation and privacy in online social networks , 2005, WPES '05.

[27]  Mthulisi Velempini,et al.  Privacy and user awareness on Facebook , 2018, South African Journal of Science.

[28]  Andrew McCallum,et al.  Employing EM and Pool-Based Active Learning for Text Classification , 1998, ICML.

[29]  Jerry R. Hobbs,et al.  DAML-S: Semantic Markup for Web Services , 2001, SWWS.

[30]  Robert Tappan Morris,et al.  Amber: Decoupling User Data from Web Applications , 2015, HotOS.

[31]  Graeme Hirst,et al.  Distributional Measures of Semantic Distance: A Survey , 2012, ArXiv.

[32]  P. Jaccard,et al.  Etude comparative de la distribution florale dans une portion des Alpes et des Jura , 1901 .