Supporting Pattern-Preserving Anonymization for Time-Series Data

Time series is an important form of data available in numerous applications and often contains vast amount of personal privacy. The need to protect privacy in time-series data while effectively supporting complex queries on them poses nontrivial challenges to the database community. We study the anonymization of time series while trying to support complex queries, such as range and pattern matching queries, on the published data. The conventional k-anonymity model cannot effectively address this problem as it may suffer severe pattern loss. We propose a novel anonymization model called (k, P)-anonymity for pattern-rich time series. This model publishes both the attribute values and the patterns of time series in separate data forms. We demonstrate that our model can prevent linkage attacks on the published data while effectively support a wide variety of queries on the anonymized data. We propose two algorithms to enforce (k, P)-anonymity on time-series data. Our anonymity model supports customized data publishing, which allows a certain part of the values but a different part of the pattern of the anonymized time series to be published simultaneously. We present estimation techniques to support query processing on such customized data. The proposed methods are evaluated in a comprehensive experimental study. Our results verify the effectiveness and efficiency of our approach.

[1]  Xin-She Yang,et al.  Introduction to Algorithms , 2021, Nature-Inspired Optimization Algorithms.

[2]  ASHWIN MACHANAVAJJHALA,et al.  L-diversity: privacy beyond k-anonymity , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[3]  Eamonn J. Keogh,et al.  A symbolic representation of time series, with implications for streaming algorithms , 2003, DMKD '03.

[4]  Eamonn J. Keogh,et al.  Dimensionality Reduction for Fast Similarity Search in Large Time Series Databases , 2001, Knowledge and Information Systems.

[5]  Ninghui Li,et al.  t-Closeness: Privacy Beyond k-Anonymity and l-Diversity , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[6]  Eamonn J. Keogh,et al.  UCR Time Series Data Mining Archive , 1983 .

[7]  Indrajit Ray,et al.  On the Optimal Selection of k in the k-Anonymity Problem , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[8]  Yufei Tao,et al.  Preservation of proximity privacy in publishing numerical sensitive data , 2008, SIGMOD Conference.

[9]  Pierangela Samarati,et al.  Protecting Respondents' Identities in Microdata Release , 2001, IEEE Trans. Knowl. Data Eng..

[10]  B. Ripley,et al.  Pattern Recognition , 1968, Nature.

[11]  Benjamin C. M. Fung,et al.  Walking in the crowd: anonymizing trajectory data for pattern analysis , 2009, CIKM.

[12]  Eamonn J. Keogh,et al.  An Enhanced Representation of Time Series Which Allows Fast and Accurate Classification, Clustering and Relevance Feedback , 1998, KDD.

[13]  Yufei Tao,et al.  Anatomy: simple and effective privacy preservation , 2006, VLDB.

[14]  Dimitrios Gunopulos Time Series Similarity Measures , 2005 .

[15]  Latanya Sweeney,et al.  Achieving k-Anonymity Privacy Protection Using Generalization and Suppression , 2002, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[16]  Philip S. Yu,et al.  A Condensation Approach to Privacy Preserving Data Mining , 2004, EDBT.

[17]  Ruggero G. Pensa,et al.  Pattern-Preserving k-Anonymization of Sequences and its Application to Mobil- ity Data Mining , 2008, PiLBA.

[18]  Jian Xu,et al.  Utility-based anonymization for privacy preservation with less information loss , 2006, SKDD.

[19]  Vicenç Torra,et al.  Towards the evaluation of time series protection methods , 2009, Inf. Sci..

[20]  Philip S. Yu,et al.  Time Series Compressibility and Privacy , 2007, VLDB.

[21]  Lisa Singh,et al.  Privacy Preserving Burst Detection of Distributed Time Series Data Using Linear Transforms , 2007, 2007 IEEE Symposium on Computational Intelligence and Data Mining.

[22]  Francesco Bonchi,et al.  Hiding Sequences , 2007, 2007 IEEE 23rd International Conference on Data Engineering Workshop.

[23]  Dimitrios Gunopulos,et al.  Time series similarity measures (tutorial PM-2) , 2000, KDD '00.