CASTLEGUARD: Anonymised Data Streams with Guaranteed Differential Privacy

Data streams are commonly used by data controllers to outsource the processing of real-time data to third-party data processors. Data protection legislation and best practice in data management support the view that data controllers are responsible for providing a guarantee of privacy for user data contained within published data streams. Continuously Anonymising STreaming data via adaptive cLustEring (CASTLE) is an established method for anonymising data streams with a guarantee of k-anonymity. However, k-anonymity has been shown to be a weak privacy guarantee that has vulnerabilities in practical applications. In this paper we propose Continuously Anonymising STreaming data via adaptive cLustEring with GUAR-anteed Differential privacy (CASTLEGUARD), a data stream anonymisation algorithm that provides a reliable guarantee of k-anonymity, l-diversity and differential privacy to data subjects. We analyse CASTLEGUARD to show that, through safe k-anonymisation and β-sampling, the proposed approach satisfies differentially private k-anonymity. Further, we demonstrate the efficacy of the approach in the context of machine learning, presenting experimental analysis to demonstrate that it can be used to protect the individual privacy of users whilst maintaining the utility of a data stream.

[1]  Latanya Sweeney,et al.  k-Anonymity: A Model for Protecting Privacy , 2002, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[2]  Kian-Lee Tan,et al.  CASTLE: Continuously Anonymizing Data Streams , 2011, IEEE Transactions on Dependable and Secure Computing.

[3]  Aaron Roth,et al.  The Algorithmic Foundations of Differential Privacy , 2014, Found. Trends Theor. Comput. Sci..

[4]  Ninghui Li,et al.  Privacy at Scale: Local Dierential Privacy in Practice , 2018 .

[5]  Moni Naor,et al.  Differential privacy under continual observation , 2010, STOC '10.

[6]  Ling Qiu,et al.  Protecting business intelligence and customer privacy while outsourcing data mining tasks , 2008, Knowledge and Information Systems.

[7]  Martin M. Merener Theoretical Results on De-Anonymization via Linkage Attacks , 2012, Trans. Data Priv..

[8]  Ninghui Li,et al.  t-Closeness: Privacy Beyond k-Anonymity and l-Diversity , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[9]  Lei Zhao,et al.  B-CASTLE: An Efficient Publishing Algorithm for K-Anonymizing Data Streams , 2010, 2010 Second WRI Global Congress on Intelligent Systems.

[10]  Ashwin Machanavajjhala,et al.  PeGaSus: Data-Adaptive Differentially Private Stream Processing , 2017, CCS.

[11]  Vitaly Shmatikov,et al.  Robust De-anonymization of Large Sparse Datasets , 2008, 2008 IEEE Symposium on Security and Privacy (sp 2008).

[12]  ASHWIN MACHANAVAJJHALA,et al.  L-diversity: privacy beyond k-anonymity , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[13]  Cynthia Dwork,et al.  Calibrating Noise to Sensitivity in Private Data Analysis , 2006, TCC.

[14]  D. Liu,et al.  Efficient Data Perturbation for Privacy Preserving and Accurate Data Stream Mining , 2018, Pervasive Mob. Comput..

[15]  Stephen A. Jarvis,et al.  Towards unified secure on- and off-line analytics at scale , 2014, Parallel Comput..

[16]  Paul Voigt,et al.  The Eu General Data Protection Regulation (Gdpr): A Practical Guide , 2017 .

[17]  Ninghui Li,et al.  On sampling, anonymization, and differential privacy or, k-anonymization meets differential privacy , 2011, ASIACCS '12.

[18]  David Leoni,et al.  Non-interactive differential privacy: a survey , 2012, WOD.