FAST: Fast Anonymization of Big Data Streams

This paper proposes an anonymization algorithm (FAST) to speed up anonymization of big data streams. The proposed parallel algorithm provides an efficient big data anonymization by a multithread technique. A proactive time-expiration heuristic is applied to publish data before they are being expired. Our simulation results indicate significant improvement in big data stream anonymization in terms of information loss and cost metric.

[1]  Sylvia L. Osborn,et al.  FAANST: Fast Anonymizing Algorithm for Numerical Streaming DaTa , 2010, DPM/SETOP.

[2]  Qishan Zhang,et al.  Fast clustering-based anonymization approaches with time constraints for data streams , 2013, Knowl. Based Syst..

[3]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[4]  ASHWIN MACHANAVAJJHALA,et al.  L-diversity: privacy beyond k-anonymity , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[5]  Jimeng Sun,et al.  Hiding in the Crowd: Privacy Preservation on Evolving Streams through Correlation Tracking , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[6]  Ninghui Li,et al.  t-Closeness: Privacy Beyond k-Anonymity and l-Diversity , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[7]  Philip S. Yu,et al.  A Framework for Clustering Evolving Data Streams , 2003, VLDB.

[8]  Bin Jiang,et al.  Continuous privacy preserving publishing of data streams , 2009, EDBT '09.

[9]  Latanya Sweeney,et al.  k-Anonymity: A Model for Protecting Privacy , 2002, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[10]  Adam Meyerson,et al.  On the complexity of optimal K-anonymity , 2004, PODS.

[11]  Philip S. Yu,et al.  Top-down specialization for information and privacy preservation , 2005, 21st International Conference on Data Engineering (ICDE'05).

[12]  Cynthia Dwork,et al.  Differential Privacy , 2006, ICALP.

[13]  Philip S. Yu,et al.  Privacy-preserving data publishing: A survey of recent developments , 2010, CSUR.

[14]  Yon Dohn Chung,et al.  A framework to preserve the privacy of electronic health data streams , 2014, J. Biomed. Informatics.

[15]  Raymond T. Ng,et al.  Very large data bases , 1994 .

[16]  Kian-Lee Tan,et al.  CASTLE: Continuously Anonymizing Data Streams , 2011, IEEE Transactions on Dependable and Secure Computing.