Dissemination of anonymized streaming data

With the vision of the emergence of streaming data marketplaces, we study the problem of how to use a scalable dissemination infrastructure, composed by a number of brokers, to disseminate anonymized streaming data to a large number of clients. To satisfy the clients, who are trusted at different anonymity levels and have their own urgencies in requiring the data, we propose to deeply integrate the anonymization process into the dissemination infrastructure. More specifically, we extend the existing anonymization algorithms to derive the anonymity data from other anonymity data with different privacy constraints rather than only from the original microdata, in a technique which we call version derivation. With this flexibility, the anonymous data can be generated as needed, according to the available bandwidth, on the way from the data source to the end clients. Exploiting such new opportunities, we formulate the problem of dissemination planning which aims at minimizing the information loss of the disseminated data. Furthermore, we design two dissemination plan optimization strategies to solve the problem. The experimental study using both synthetic and real datasets verifies the effectiveness of our approach.

[1]  David J. DeWitt,et al.  Mondrian Multidimensional K-Anonymity , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[2]  Alfons Kemper,et al.  StreamGlobe: Processing and Sharing Data Streams in Grid-Based P2P Infrastructures , 2005, VLDB.

[3]  Ke Wang,et al.  Privacy-Preserving Classification for Data Streams , 2007 .

[4]  Kian-Lee Tan,et al.  CASTLE: Continuously Anonymizing Data Streams , 2011, IEEE Transactions on Dependable and Secure Computing.

[5]  Cynthia Dwork,et al.  Differential Privacy , 2006, ICALP.

[6]  Krithi Ramamritham,et al.  An Efficient and Resilient Approach to Filtering and Disseminating Streaming Data , 2003, VLDB.

[7]  Bin Li,et al.  Trust and Privacy in Dissemination Control , 2009, 2009 IEEE International Conference on e-Business Engineering.

[8]  Beng Chin Ooi,et al.  Adaptive Reorganization of Coherency-Preserving Dissemination Tree for Streaming Data , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[9]  Karl Aberer,et al.  Scalable Delivery of Stream Query Results , 2009, Proc. VLDB Endow..

[10]  Li Su,et al.  Multi-scale dissemination of time series data , 2013, SSDBM.

[11]  Panos Kalnis,et al.  Fast Data Anonymization with Low Information Loss , 2007, VLDB.

[12]  Jian Xu,et al.  Utility-based anonymization for privacy preservation with less information loss , 2006, SKDD.

[13]  Vijay S. Iyengar,et al.  Transforming data to satisfy privacy constraints , 2002, KDD.

[14]  Ming Li,et al.  Event dissemination via group-aware stream filtering , 2008, DEBS.

[15]  Bin Jiang,et al.  Continuous privacy preserving publishing of data streams , 2009, EDBT '09.

[16]  David J. DeWitt,et al.  Incognito: efficient full-domain K-anonymity , 2005, SIGMOD '05.

[17]  Alin Deutsch,et al.  Load-balanced query dissemination in privacy-aware online communities , 2010, SIGMOD Conference.

[18]  Yufei Tao,et al.  The hardness and approximation algorithms for l-diversity , 2009, EDBT '10.

[19]  ASHWIN MACHANAVAJJHALA,et al.  L-diversity: privacy beyond k-anonymity , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[20]  Robbert van Renesse,et al.  SecureStream: An intrusion-tolerant protocol for live-streaming dissemination , 2008, Comput. Commun..

[21]  Ling Liu,et al.  Quality-aware dstributed data delivery for continuous query services , 2006, SIGMOD Conference.

[22]  Mhand Hifi,et al.  Heuristic algorithms for the multiple-choice multidimensional knapsack problem , 2004, J. Oper. Res. Soc..

[23]  Pham Khac Giap Delay models in data networks , 2012 .

[24]  Beng Chin Ooi,et al.  Disseminating streaming data in a dynamic environment: an adaptive and cost-based approach , 2008, The VLDB Journal.

[25]  Jimeng Sun,et al.  Hiding in the Crowd: Privacy Preservation on Evolving Streams through Correlation Tracking , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[26]  Ling Liu,et al.  Butterfly: Protecting Output Privacy in Stream Mining , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[27]  Elisa Bertino,et al.  A privacy-preserving approach to policy-based content dissemination , 2010, 2010 IEEE 26th International Conference on Data Engineering (ICDE 2010).

[28]  Prashant J. Shenoy,et al.  Maintaining Coherency of Dynamic Data in Cooperating Repositories , 2002, VLDB.

[29]  Sylvia L. Osborn,et al.  FAANST: Fast Anonymizing Algorithm for Numerical Streaming DaTa , 2010, DPM/SETOP.

[30]  Olga Papaemmanouil,et al.  Semantic multicast for content-based stream dissemination , 2004, WebDB '04.

[31]  Beng Chin Ooi,et al.  Anonymizing Streaming Data for Privacy Protection , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[32]  Zografoula Vagena,et al.  Dissemination of models over time-varying data , 2011, Proc. VLDB Endow..

[33]  Latanya Sweeney,et al.  Achieving k-Anonymity Privacy Protection Using Generalization and Suppression , 2002, Int. J. Uncertain. Fuzziness Knowl. Based Syst..