Publishing Time-Series Data under Preservation of Privacy and Distance Orders

In this paper we address the problem of preserving mining accuracy as well as privacy in publishing sensitive time-series data. For example, people with heart disease do not want to disclose their electrocardiogram time-series, but they still allow mining of some accurate patterns from their time-series. Based on this observation, we introduce the related assumptions and requirements.We show that only randomization methods satisfy all assumptions, but even those methods do not satisfy the requirements. Thus, we discuss the randomization-based solutions that satisfy all assumptions and requirements. For this purpose, we use the noise averaging effect of piecewise aggregate approximation (PAA), which may alleviate the problem of destroying distance orders in randomly perturbed time-series. Based on the noise averaging effect, we first propose two naive solutions that use the random data perturbation in publishing time-series while exploiting the PAA distance in computing distances. There is, however, a tradeoff between these two solutions with respect to uncertainty and distance orders. We thus propose two more advanced solutions that take advantages of both naive solutions. Experimental results show that our advanced solutions are superior to the naive solutions.

[1]  Wenfei Fan,et al.  Keys with Upward Wildcards for XML , 2001, DEXA.

[2]  Philip S. Yu,et al.  Time Series Compressibility and Privacy , 2007, VLDB.

[3]  Eamonn J. Keogh,et al.  Dimensionality Reduction for Fast Similarity Search in Large Time Series Databases , 2001, Knowledge and Information Systems.

[4]  Elisa Bertino,et al.  Using Anonymized Data for Classification , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[5]  Elisa Bertino,et al.  Association rule hiding , 2004, IEEE Transactions on Knowledge and Data Engineering.

[6]  Rakesh Agrawal,et al.  Privacy-preserving data mining , 2000, SIGMOD 2000.

[7]  Philip S. Yu,et al.  Privacy-Preserving Data Mining - Models and Algorithms , 2008, Advances in Database Systems.

[8]  Philip S. Yu,et al.  Privacy-Preserving Data Mining: A Survey , 2008, Handbook of Database Security.

[9]  Aryya Gangopadhyay,et al.  A privacy-preserving technique for Euclidean distance-based mining algorithms using Fourier-related transforms , 2006, The VLDB Journal.

[10]  Chris Clifton,et al.  Tools for privacy preserving distributed data mining , 2002, SKDD.

[11]  Osmar R. Zaïane,et al.  Privacy Preserving Clustering by Data Transformation , 2010, J. Inf. Data Manag..

[12]  Yang-Sae Moon,et al.  Noise Control Boundary Image Matching Using Time-Series Moving Average Transform , 2008, DEXA.

[13]  Wenliang Du,et al.  Deriving private information from randomized data , 2005, SIGMOD '05.

[14]  Roberto J. Bayardo,et al.  Data privacy through optimal k-anonymization , 2005, 21st International Conference on Data Engineering (ICDE'05).

[15]  Elisa Bertino,et al.  A Survey of Quantification of Privacy Preserving Data Mining Algorithms , 2008, Privacy-Preserving Data Mining.