Hiding in the Crowd: Privacy Preservation on Evolving Streams through Correlation Tracking

We address the problem of preserving privacy in streams, which has received surprisingly limited attention. For static data, a well-studied and widely used approach is based on random perturbation of the data values. However, streams pose additional challenges. First, analysis of the data has to be performed incrementally, using limited processing time and buffer space, making batch approaches unsuitable. Second, the characteristics of streams evolve over time. Consequently, approaches based on global analysis of the data are not adequate. We show that it is possible to efficiently and effectively track the correlation and autocorrelation structure of multivariate streams and leverage it to add noise which maximally preserves privacy, in the sense that it is very hard to remove. Our techniques achieve much better results than previous static, global approaches, while requiring limited processing time and memory. We provide both a mathematical analysis and experimental evaluation on real data to validate the correctness, efficiency, and effectiveness of our algorithms.

[1]  Qi Wang,et al.  On the privacy preserving properties of random data perturbation techniques , 2003, Third IEEE International Conference on Data Mining.

[2]  Wei Hong,et al.  Model-Driven Data Acquisition in Sensor Networks , 2004, VLDB.

[3]  Jimeng Sun,et al.  Streaming Pattern Discovery in Multiple Time-Series , 2005, VLDB.

[4]  S. Muthukrishnan,et al.  Data streams: algorithms and applications , 2005, SODA '03.

[5]  Jennifer Widom,et al.  Models and issues in data stream systems , 2002, PODS.

[6]  Heng Tao Shen,et al.  Principal Component Analysis , 2009, Encyclopedia of Biometrics.

[7]  S. Haykin,et al.  Adaptive Filter Theory , 1986 .

[8]  Kun Liu,et al.  Random projection-based multiplicative data perturbation for privacy preserving distributed data mining , 2006, IEEE Transactions on Knowledge and Data Engineering.

[9]  R. Shanmugam Introduction to Time Series and Forecasting , 1997 .

[10]  ASHWIN MACHANAVAJJHALA,et al.  L-diversity: privacy beyond k-anonymity , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[11]  Wenliang Du,et al.  Deriving private information from randomized data , 2005, SIGMOD '05.

[12]  K Papathanasiou,et al.  Advanced sigmoid colon cancer in pregnancy presented as an abscess , 2004, Journal of obstetrics and gynaecology : the journal of the Institute of Obstetrics and Gynaecology.

[13]  Beng Chin Ooi,et al.  Privacy and ownership preserving of outsourced medical data , 2005, 21st International Conference on Data Engineering (ICDE'05).

[14]  Charu C. Aggarwal,et al.  On the design and quantification of privacy preserving data mining algorithms , 2001, PODS.

[15]  Jaideep Vaidya,et al.  Privacy preserving association rule mining in vertically partitioned data , 2002, KDD.

[16]  Alexandre V. Evfimievski,et al.  Limiting privacy breaches in privacy preserving data mining , 2003, PODS.

[17]  Yufei Tao,et al.  Personalized privacy preservation , 2006, Privacy-Preserving Data Mining.

[18]  Yehuda Lindell,et al.  Privacy Preserving Data Mining , 2002, Journal of Cryptology.

[19]  Dennis Shasha,et al.  StatStream: Statistical Monitoring of Thousands of Data Streams in Real Time , 2002, VLDB.

[20]  R. O. Schmidt,et al.  Multiple emitter location and signal Parameter estimation , 1986 .

[21]  Latanya Sweeney,et al.  k-Anonymity: A Model for Protecting Privacy , 2002, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[22]  Feifei Li,et al.  Characterizing and Exploiting Reference Locality in Data Stream Applications , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[23]  Keke Chen,et al.  Privacy preserving data classification with rotation perturbation , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[24]  Dimitrios Gunopulos,et al.  Correlating synchronous and asynchronous data streams , 2003, KDD '03.

[25]  Michael Ghil,et al.  ADVANCED SPECTRAL METHODS FOR CLIMATIC TIME SERIES , 2002 .

[26]  Rafail Ostrovsky,et al.  Private Searching on Streaming Data , 2005, Journal of Cryptology.

[27]  Richard A. Davis,et al.  Introduction to time series and forecasting , 1998 .

[28]  Philip S. Yu,et al.  A Condensation Approach to Privacy Preserving Data Mining , 2004, EDBT.

[29]  Ramakrishnan Srikant,et al.  Privacy preserving OLAP , 2005, SIGMOD '05.

[30]  Alexandre V. Evfimievski,et al.  Privacy preserving mining of association rules , 2002, Inf. Syst..

[31]  Jayant R. Haritsa,et al.  A Framework for High-Accuracy Privacy-Preserving Mining , 2005, ICDE.

[32]  Wenliang Du,et al.  Using randomized response techniques for privacy-preserving data mining , 2003, KDD '03.

[33]  T. Hughes,et al.  Signals and systems , 2006, Genome Biology.

[34]  Ashwin Machanavajjhala,et al.  l-Diversity: Privacy Beyond k-Anonymity , 2006, ICDE.

[35]  Philip S. Yu,et al.  Optimal multi-scale patterns in time series streams , 2006, SIGMOD Conference.

[36]  Brent Waters,et al.  New constructions and practical applications for private stream searching , 2006, 2006 IEEE Symposium on Security and Privacy (S&P'06).

[37]  Daniel Kifer,et al.  Injecting utility into anonymized datasets , 2006, SIGMOD Conference.