Preserving Privacy in Time Series Data Mining

Time series data mining poses new challenges to privacy. Through extensive experiments, the authors find that existing privacy-preserving techniques such as aggregation and adding random noise are insufficient due to privacy attacks such as data flow separation attack. This paper also presents a general model for publishing and mining time series data and its privacy issues. Based on the model, a spectrum of privacy preserving methods is proposed. For each method, effects on classification accuracy, aggregation error, and privacy leak are studied. Experiments are conducted to evaluate the performance of the methods. The results show that the methods can effectively preserve privacy without losing much classification accuracy and within a specified limit of aggregation error.

[1]  Heikki Mannila,et al.  Rule Discovery from Time Series , 1998, KDD.

[2]  Eamonn J. Keogh,et al.  Mining motifs in massive time series databases , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[3]  Matthias Klusch,et al.  Privacy-Preserving Discovery of Frequent Patterns in Time Series , 2007, ICDM.

[4]  Matteo Golfarelli,et al.  A Survey on Temporal Data Warehousing , 2009, Int. J. Data Warehous. Min..

[5]  Lei Liu,et al.  Optimal randomization for privacy preserving data mining , 2004, KDD.

[6]  Chris Clifton,et al.  Privacy-preserving k-means clustering over vertically partitioned data , 2003, KDD '03.

[7]  Gareth J. Janacek,et al.  Clustering time series from ARMA models with clipped data , 2004, KDD.

[8]  Li Wei,et al.  Fast time series classification using numerosity reduction , 2006, ICML.

[9]  Roberto J. Bayardo,et al.  Data privacy through optimal k-anonymization , 2005, 21st International Conference on Data Engineering (ICDE'05).

[10]  Jaideep Vaidya,et al.  Privacy preserving association rule mining in vertically partitioned data , 2002, KDD.

[11]  Philip S. Yu,et al.  Time Series Compressibility and Privacy , 2007, VLDB.

[12]  Taneli Mielikäinen,et al.  Aggregating time partitions , 2006, KDD '06.

[13]  Huirong Fu,et al.  On Privacy in Time Series Data Mining , 2008, PAKDD.

[14]  Eamonn J. Keogh,et al.  A Probabilistic Approach to Fast Pattern Matching in Time Series Databases , 1997, KDD.

[15]  Yücel Saygin,et al.  Distributed Privacy Preserving Clustering via Homomorphic Secret Sharing and Its Application to (Vertically) Partitioned Spatio-Temporal Data , 2011, Int. J. Data Warehous. Min..

[16]  Chris Clifton,et al.  Privacy-preserving distributed mining of association rules on horizontally partitioned data , 2004, IEEE Transactions on Knowledge and Data Engineering.

[17]  Pierre Geurts,et al.  Pattern Extraction for Time Series Classification , 2001, PKDD.

[18]  Arnaud Giacometti,et al.  MILPRIT*: A Constraint-Based Algorithm for Mining Temporal Relational Patterns , 2008, Int. J. Data Warehous. Min..

[19]  David J. DeWitt,et al.  Workload-aware anonymization , 2006, KDD '06.

[20]  Fabian Mörchen,et al.  Optimizing time series discretization for knowledge discovery , 2005, KDD '05.

[21]  Rebecca N. Wright,et al.  Privacy-preserving Bayesian network structure computation on distributed heterogeneous data , 2004, KDD.

[22]  Kyuseok Shim,et al.  Fast Similarity Search in the Presence of Noise, Scaling, and Translation in Time-Series Databases , 1995, VLDB.

[23]  Rebecca N. Wright,et al.  Privacy-preserving distributed k-means clustering over arbitrarily partitioned data , 2005, KDD '05.

[24]  Eamonn J. Keogh,et al.  An Enhanced Representation of Time Series Which Allows Fast and Accurate Classification, Clustering and Relevance Feedback , 1998, KDD.

[25]  Padhraic Smyth,et al.  Deformable Markov model templates for time-series pattern matching , 2000, KDD '00.

[26]  Pierre Comon,et al.  Independent component analysis, A new concept? , 1994, Signal Process..

[27]  Christos Faloutsos,et al.  Efficient Similarity Search In Sequence Databases , 1993, FODO.

[28]  Rakesh Agrawal,et al.  Privacy-preserving data mining , 2000, SIGMOD 2000.

[29]  Donald J. Berndt,et al.  Using Dynamic Time Warping to Find Patterns in Time Series , 1994, KDD Workshop.

[30]  Christos Faloutsos,et al.  Fast subsequence matching in time-series databases , 1994, SIGMOD '94.

[31]  Richard Cole,et al.  Fast window correlations over uncooperative time series , 2005, KDD '05.

[32]  Philip S. Yu,et al.  Anonymizing Classification Data for Privacy Preservation , 2007, IEEE Transactions on Knowledge and Data Engineering.

[33]  Yehuda Lindell,et al.  Privacy Preserving Data Mining , 2000, Journal of Cryptology.

[34]  Wenliang Du,et al.  Deriving private information from randomized data , 2005, SIGMOD '05.

[35]  Eamonn J. Keogh,et al.  Locally adaptive dimensionality reduction for indexing large time series databases , 2001, SIGMOD '01.

[36]  Vijay S. Iyengar,et al.  Transforming data to satisfy privacy constraints , 2002, KDD.

[37]  Wenliang Du,et al.  Secure multi-party computation problems and their applications: a review and open problems , 2001, NSPW '01.

[38]  Mohammad Saraee,et al.  Improving Similarity Search in Time Series Using Wavelets , 2006, Int. J. Data Warehous. Min..

[39]  Alexandre V. Evfimievski,et al.  Privacy preserving mining of association rules , 2002, Inf. Syst..

[40]  Eamonn J. Keogh,et al.  Scaling up dynamic time warping for datamining applications , 2000, KDD '00.

[41]  Ke Wang,et al.  Computing Join Aggregates Over Private Tables , 2008, Int. J. Data Warehous. Min..

[42]  Wenliang Du,et al.  Using randomized response techniques for privacy-preserving data mining , 2003, KDD '03.

[43]  Jean-Francois Cardoso,et al.  Blind signal separation: statistical principles , 1998, Proc. IEEE.

[44]  Eamonn J. Keogh,et al.  Clustering of time-series subsequences is meaningless: implications for previous and future research , 2004, Knowledge and Information Systems.

[45]  Aapo Hyvärinen,et al.  Fast and robust fixed-point algorithms for independent component analysis , 1999, IEEE Trans. Neural Networks.