Time Series Compressibility and Privacy

In this paper we study the trade-offs between time series compressibility and partial information hiding and their fundamental implications on how we should introduce uncertainty about individual values by perturbing them. More specifically, if the perturbation does not have the same compressibility properties as the original data, then it can be detected and filtered out, reducing uncertainty. Thus, by making the perturbation "similar" to the original data, we can both preserve the structure of the data better, while simultaneously making breaches harder. However, as data become more compressible, a fraction of the uncertainty can be removed if true values are leaked, revealing how they were perturbed. We formalize these notions, study the above trade-offs on real data and develop practical schemes which strike a good balance and can also be extended for on-the-fly data hiding in a streaming environment.

[1]  Radu Sion,et al.  Rights protection for discrete numeric streams , 2006, IEEE Transactions on Knowledge and Data Engineering.

[2]  Qi Wang,et al.  On the privacy preserving properties of random data perturbation techniques , 2003, Third IEEE International Conference on Data Mining.

[3]  Ambuj K. Singh,et al.  SWAT: hierarchical stream summarization in large networks , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[4]  Michael R. Chernick,et al.  Wavelet Methods for Time Series Analysis , 2001, Technometrics.

[5]  S. Muthukrishnan,et al.  Surfing Wavelets on Streams: One-Pass Summaries for Approximate Aggregate Queries , 2001, VLDB.

[6]  Dennis Shasha,et al.  StatStream: Statistical Monitoring of Thousands of Data Streams in Real Time , 2002, VLDB.

[7]  Jimeng Sun,et al.  Hiding in the Crowd: Privacy Preservation on Evolving Streams through Correlation Tracking , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[8]  Wenliang Du,et al.  Using randomized response techniques for privacy-preserving data mining , 2003, KDD '03.

[9]  Emmanuel J. Candès,et al.  Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information , 2004, IEEE Transactions on Information Theory.

[10]  I. Johnstone,et al.  Adapting to Unknown Smoothness via Wavelet Shrinkage , 1995 .

[11]  Dennis Shasha,et al.  Efficient elastic burst detection in data streams , 2003, KDD '03.

[12]  Dorothy E. Denning,et al.  Secure statistical databases with random sample queries , 1980, TODS.

[13]  Christos Faloutsos,et al.  AWSOM: Adaptive, Hands-Off Stream Mining , 2003 .

[14]  Nikos Mamoulis,et al.  One-Pass Wavelet Synopses for Maximum-Error Metrics , 2005, VLDB.

[15]  Christos Faloutsos,et al.  Adaptive, Hands-Off Stream Mining , 2003, VLDB.

[16]  Eamonn J. Keogh,et al.  UCR Time Series Data Mining Archive , 1983 .

[17]  Beng Chin Ooi,et al.  Privacy and ownership preserving of outsourced medical data , 2005, 21st International Conference on Data Engineering (ICDE'05).

[18]  Charu C. Aggarwal,et al.  On the design and quantification of privacy preserving data mining algorithms , 2001, PODS.

[19]  Keke Chen,et al.  Privacy preserving data classification with rotation perturbation , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[20]  Philip K. Chan SensorMiner : Tool Kit for Anomaly Detection in Physical Time Series ( Industrial Track Submission , 2006 .

[21]  Latanya Sweeney,et al.  k-Anonymity: A Model for Protecting Privacy , 2002, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[22]  Cynthia Dwork,et al.  Practical privacy: the SuLQ framework , 2005, PODS.

[23]  Wenliang Du,et al.  Deriving private information from randomized data , 2005, SIGMOD '05.

[24]  Minos N. Garofalakis,et al.  Wavelet synopses with error guarantees , 2002, SIGMOD '02.

[25]  David L. Donoho,et al.  De-noising by soft-thresholding , 1995, IEEE Trans. Inf. Theory.

[26]  Yufei Tao,et al.  Personalized privacy preservation , 2006, Privacy-Preserving Data Mining.

[27]  Yehuda Lindell,et al.  Privacy Preserving Data Mining , 2002, Journal of Cryptology.

[28]  Dimitrios Gunopulos,et al.  Iterative Incremental Clustering of Time Series , 2004, EDBT.

[29]  Kun Liu,et al.  Random projection-based multiplicative data perturbation for privacy preserving distributed data mining , 2006, IEEE Transactions on Knowledge and Data Engineering.

[30]  D. Donoho,et al.  Uncertainty principles and signal recovery , 1989 .

[31]  Charu C. Aggarwal,et al.  On k-Anonymity and the Curse of Dimensionality , 2005, VLDB.

[32]  Alexandre V. Evfimievski,et al.  Limiting privacy breaches in privacy preserving data mining , 2003, PODS.

[33]  Philip S. Yu,et al.  Structural Periodic Measures for Time-Series Data , 2005, Data Mining and Knowledge Discovery.

[34]  Walter L. Smith Probability and Statistics , 1959, Nature.

[35]  ASHWIN MACHANAVAJJHALA,et al.  L-diversity: privacy beyond k-anonymity , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[36]  Cynthia Dwork,et al.  Calibrating Noise to Sensitivity in Private Data Analysis , 2006, TCC.

[37]  Gu Si-yang,et al.  Privacy preserving association rule mining in vertically partitioned data , 2006 .

[38]  Shenghuo Zhu,et al.  A survey on wavelet applications in data mining , 2002, SKDD.

[39]  Daniel Kifer,et al.  Injecting utility into anonymized datasets , 2006, SIGMOD Conference.