Privately detecting bursts in streaming, distributed time series data

Surprisingly, privacy preservation in the context of streaming data has received limited attention from computer scientists. In this paper, we consider privacy preservation in the context of independently owned, distributed data streams. Specifically, we want to protect the privacy of each individual participant's data stream while identifying bursts that exist across participant streams. We define two types of privacy breaches, data breaches and envelope breaches. In order to protect individual data, each participant transforms large subsets of the stream into small vectors that approximate the stream. These small vectors are calculated by summing coefficients of wavelet transforms at different resolutions. The participants share their vectors using bursty, self-eliminating noise. The combined participant vectors can then be used to detect bursts. We find that our approach leads to accurate burst detection results with reduced communication costs. We demonstrate these findings using both real and synthetic data.

[1]  Vipin Kumar,et al.  Introduction to Data Mining , 2022, Data Mining and Machine Learning Applications.

[2]  Piotr Indyk,et al.  Maintaining stream statistics over sliding windows: (extended abstract) , 2002, SODA '02.

[3]  Chris Clifton,et al.  Privacy-preserving outlier detection , 2004, Fourth IEEE International Conference on Data Mining (ICDM'04).

[4]  Henryk Wozniakowski,et al.  The statistical security of a statistical database , 1984, TODS.

[5]  S. Muthukrishnan,et al.  Data streams: algorithms and applications , 2005, SODA '03.

[6]  Bharat K. Bhargava,et al.  Trust-based privacy preservation for peer-to-peer data sharing , 2006, IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans.

[7]  Chris Clifton,et al.  Tools for privacy preserving distributed data mining , 2002, SKDD.

[8]  Cyrus Shahabi,et al.  TSA-tree: a wavelet-based approach to improve the efficiency of multi-level surprise and trend queries on time-series data , 2000, Proceedings. 12th International Conference on Scientific and Statistica Database Management.

[9]  Philip S. Yu,et al.  Anonymizing Classification Data for Privacy Preservation , 2007, IEEE Transactions on Knowledge and Data Engineering.

[10]  Latanya Sweeney,et al.  k-Anonymity: A Model for Protecting Privacy , 2002, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[11]  Elisa Bertino,et al.  State-of-the-art in privacy preserving data mining , 2004, SGMD.

[12]  Alexandre V. Evfimievski,et al.  Privacy preserving mining of association rules , 2002, Inf. Syst..

[13]  Lisa Singh,et al.  Detecting Aggregate Bursts from Scaled Bins within the Context of Privacy , 2007, 2007 IEEE 23rd International Conference on Data Engineering Workshop.

[14]  Roberto J. Bayardo,et al.  Data privacy through optimal k-anonymization , 2005, 21st International Conference on Data Engineering (ICDE'05).

[15]  Chris Clifton,et al.  Privacy Preserving Naïve Bayes Classifier for Vertically Partitioned Data , 2004, SDM.

[16]  G. Barnard Control Charts and Stochastic Processes , 1959 .

[17]  Yücel Saygin,et al.  Privacy preserving clustering on horizontally partitioned data , 2007, Data Knowl. Eng..

[18]  Theofanis Sapatinas,et al.  Signal Detection in Underwater Sound using Wavelets , 1998 .

[19]  Dorothy E. Denning,et al.  Cryptography and Data Security , 1982 .

[20]  Renée J. Miller,et al.  Similarity search over time-series data using wavelets , 2002, Proceedings 18th International Conference on Data Engineering.

[21]  Jimeng Sun,et al.  Hiding in the Crowd: Privacy Preservation on Evolving Streams through Correlation Tracking , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[22]  Vipin Kumar,et al.  Introduction to Data Mining, (First Edition) , 2005 .

[23]  Xiaodong Lin,et al.  Secure Regression on Distributed Databases , 2005 .

[24]  Guanling Lee,et al.  An efficient sanitization algorithm for balancing information privacy and knowledge discovery in association patterns mining , 2008, Data Knowl. Eng..

[25]  Chris Clifton,et al.  Privacy-preserving clustering with distributed EM mixture modeling , 2004, Knowledge and Information Systems.

[26]  Philip S. Yu,et al.  Fast Burst Correlation of Financial Data , 2005, PKDD.

[27]  Andrew Chi-Chih Yao,et al.  How to generate and exchange secrets , 1986, 27th Annual Symposium on Foundations of Computer Science (sfcs 1986).

[28]  Dimitrios Gunopulos,et al.  Finding Similar Time Series , 1997, PKDD.

[29]  Eyal Kushilevitz,et al.  Private information retrieval , 1998, JACM.

[30]  Sumit Sarkar,et al.  A Tree-Based Data Perturbation Approach for Privacy-Preserving Data Mining , 2006, IEEE Transactions on Knowledge and Data Engineering.

[31]  Lisa Singh,et al.  Privacy Preserving Burst Detection of Distributed Time Series Data Using Linear Transforms , 2007, 2007 IEEE Symposium on Computational Intelligence and Data Mining.

[32]  Dennis Shasha,et al.  Efficient elastic burst detection in data streams , 2003, KDD '03.

[33]  Chris Clifton,et al.  Thoughts on k-Anonymization , 2006, 22nd International Conference on Data Engineering Workshops (ICDEW'06).

[34]  A. R. Crathorne,et al.  Economic Control of Quality of Manufactured Product. , 1933 .

[35]  Nabil R. Adam,et al.  Security-control methods for statistical databases: a comparative study , 1989, ACM Comput. Surv..

[36]  Li Liu,et al.  The applicability of the perturbation based privacy preserving data mining for real-world data , 2008, Data Knowl. Eng..

[37]  Rakesh Agrawal,et al.  Privacy-preserving data mining , 2000, SIGMOD 2000.

[38]  Donald J. Berndt,et al.  Using Dynamic Time Warping to Find Patterns in Time Series , 1994, KDD Workshop.

[39]  Amara Lynn Graps,et al.  An introduction to wavelets , 1995 .

[40]  Joydeep Ghosh,et al.  Privacy-preserving distributed clustering using generative models , 2003, Third IEEE International Conference on Data Mining.

[41]  Jan Schlörer,et al.  Information Loss in Partitioned Statistical Databases , 1983, Comput. J..

[42]  George E. P. Box,et al.  Time Series Analysis: Forecasting and Control , 1977 .

[43]  Chris Clifton,et al.  Privacy-preserving distributed mining of association rules on horizontally partitioned data , 2004, IEEE Transactions on Knowledge and Data Engineering.

[44]  Elisa Bertino,et al.  Association rule hiding , 2004, IEEE Transactions on Knowledge and Data Engineering.

[45]  Chris Clifton,et al.  Privacy-preserving k-means clustering over vertically partitioned data , 2003, KDD '03.