Defining, Enforcing and Checking Privacy Policies In Data-Intensive Applications

The rise of Big Data is leading to an increasing demand for large-scale data-intensive applications (DIAs), which have to analyse massive amounts of personal data (e.g. customers' location, cars' speed, people heartbeat, etc.), some of which can be sensitive, meaning that its confidentiality has to be protected. In this context, DIA providers are responsible for enforcing privacy policies that account for the privacy preferences of data subjects as well as for general privacy regulations. This is the case, for instance, of data brokers, i.e. companies that continuously collect and analyse data in order to provide useful analytics to their clients. Unfortunately, the enforcement of privacy policies in modern DIAs tends to become cumbersome because (i) the number of policies can easily explode, depending on the number of data subjects, (ii) policy enforcement has to autonomously adapt to the application context, thus, requiring some non-trivial runtime reasoning, and (iii) designing and developing modern DIAs is complex per se. For the above reasons, we need specific design and runtime methods enabling so called privacy-by-design in a Big Data context. In this article we propose an approach for specifying, enforcing and checking privacy policies on DIAs designed according to the Google Dataflow model and we show that the enforcement approach behaves correctly in the considered cases and introduces a performance overhead that is acceptable given the requirements of a typical DIA.

[1]  Kian-Lee Tan,et al.  A framework to enforce access control over data streams , 2010, TSEC.

[2]  Felix Klaedtke,et al.  Monitoring Metric First-Order Temporal Properties , 2015, J. ACM.

[3]  C. L. Philip Chen,et al.  Data-intensive applications, challenges, techniques and technologies: A survey on Big Data , 2014, Inf. Sci..

[4]  Helen Nissenbaum,et al.  Privacy and contextual integrity: framework and applications , 2006, 2006 IEEE Symposium on Security and Privacy (S&P'06).

[5]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[6]  Michael Weber,et al.  Context-Adaptive Privacy: Leveraging Context Awareness to Support Privacy Decision Making , 2015, IEEE Pervasive Computing.

[7]  Latanya Sweeney,et al.  Achieving k-Anonymity Privacy Protection Using Generalization and Suppression , 2002, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[8]  Craig Chambers,et al.  The Dataflow Model: A Practical Approach to Balancing Correctness, Latency, and Cost in Massive-Scale, Unbounded, Out-of-Order Data Processing , 2015, Proc. VLDB Endow..

[9]  Murat Kantarcioglu,et al.  Vigiles: Fine-Grained Access Control for MapReduce Systems , 2014, 2014 IEEE International Congress on Big Data.

[10]  H. Nissenbaum Privacy as contextual integrity , 2004 .

[11]  Ron Koymans,et al.  Specifying real-time properties with metric temporal logic , 1990, Real-Time Systems.

[12]  Yunhao Liu,et al.  Big Data: A Survey , 2014, Mob. Networks Appl..

[13]  Elisa Bertino,et al.  Data Security and Privacy: Concepts, Approaches, and Research Directions , 2016, 2016 IEEE 40th Annual Computer Software and Applications Conference (COMPSAC).

[14]  Armando Solar-Lezama,et al.  Precise, dynamic information flow for database-backed applications , 2015, PLDI.

[15]  Felix Klaedtke,et al.  Monitoring Usage-Control Policies in Distributed Systems , 2011, 2011 Eighteenth International Symposium on Temporal Representation and Reasoning.

[16]  Bashar Nuseibeh,et al.  Adaptive Sharing for Online Social Networks: A Trade-off Between Privacy Risk and Social Benefit , 2014, 2014 IEEE 13th International Conference on Trust, Security and Privacy in Computing and Communications.

[17]  Armando Solar-Lezama,et al.  A language for automatically enforcing privacy policies , 2012, POPL '12.

[18]  Qi Alfred Chen,et al.  ContexloT: Towards Providing Contextual Integrity to Appified IoT Platforms , 2017, NDSS.

[19]  Elisabetta Di Nitto,et al.  Towards a Model-Driven Design Tool for Big Data Architectures , 2016, 2016 IEEE/ACM 2nd International Workshop on Big Data Software Engineering (BIGDSE).

[20]  Seif Haridi,et al.  Apache Flink™: Stream and Batch Processing in a Single Engine , 2015, IEEE Data Eng. Bull..

[21]  Jorge Lobo,et al.  Privacy-Aware Role-Based Access Control , 2007, IEEE Security & Privacy.

[22]  Scott Shenker,et al.  Making Sense of Performance in Data Analytics Frameworks , 2015, NSDI.