BangA: An Efficient and Flexible Generalization-Based Algorithm for Privacy Preserving Data Publication

Privacy-Preserving Data Publishing (PPDP) has become a critical issue for companies and organizations that would release their data. k-Anonymization was proposed as a first generalization model to guarantee against identity disclosure of individual records in a data set. Point access methods (PAMs) are not well studied for the problem of data anonymization. In this article, we propose yet another approximation algorithm for anonymization, coined BangA, that combines useful features from Point Access Methods (PAMs) and clustering. Hence, it achieves fast computation and scalability as a PAM, and very high quality thanks to its density-based clustering step. Extensive experiments show the efficiency and effectiveness of our approach. Furthermore, we provide guidelines for extending BangA to achieve a relaxed form of differential privacy which provides stronger privacy guarantees as compared to traditional privacy definitions.

[1]  Jiong Yang,et al.  STING: A Statistical Information Grid Approach to Spatial Data Mining , 1997, VLDB.

[2]  Pierangela Samarati,et al.  Generalizing Data to Provide Anonymity when Disclosing Information , 1998, PODS 1998.

[3]  ASHWIN MACHANAVAJJHALA,et al.  L-diversity: privacy beyond k-anonymity , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[4]  Walid G. Aref,et al.  Casper*: Query processing for location services without compromising privacy , 2006, TODS.

[5]  Ninghui Li,et al.  Provably Private Data Anonymization: Or, k-Anonymity Meets Differential Privacy , 2011, ArXiv.

[6]  Samir Khuller,et al.  Achieving anonymity via clustering , 2006, PODS '06.

[7]  Michael Freeston,et al.  The BANG file: A new kind of grid file , 1987, SIGMOD '87.

[8]  Panos Kalnis,et al.  Private queries in location based services: anonymizers are not necessary , 2008, SIGMOD Conference.

[9]  Latanya Sweeney,et al.  k-Anonymity: A Model for Protecting Privacy , 2002, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[10]  Guillermo Navarro-Arribas,et al.  User k-anonymity for privacy preserving data mining of query logs , 2012, Inf. Process. Manag..

[11]  Erich Schikuta,et al.  Grid-clustering: an efficient hierarchical clustering method for very large data sets , 1996, Proceedings of 13th International Conference on Pattern Recognition.

[12]  Marco Gruteser,et al.  USENIX Association , 1992 .

[13]  Erich Schikuta,et al.  The BANG-Clustering System: Grid-Based Data Analysis , 1997, IDA.

[14]  Ninghui Li,et al.  t-Closeness: Privacy Beyond k-Anonymity and l-Diversity , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[15]  Elisa Bertino,et al.  The PROBE Framework for the Personalized Cloaking of Private Locations , 2010, Trans. Data Priv..

[16]  Philip S. Yu,et al.  Privacy-preserving data publishing: A survey of recent developments , 2010, CSUR.

[17]  D. DeWitt,et al.  K-Anonymization as Spatial Indexing: Toward Scalable and Incremental Anonymization , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[18]  David J. DeWitt,et al.  Incognito: efficient full-domain K-anonymity , 2005, SIGMOD '05.

[19]  Panos Kalnis,et al.  Privacy-Preserving Publication of User Locations in the Proximity of Sensitive Sites , 2008, SSDBM.

[20]  Maarten Löffler,et al.  Range Searching , 2016, Encyclopedia of Algorithms.

[21]  Daniel Kifer,et al.  Injecting utility into anonymized datasets , 2006, SIGMOD Conference.

[22]  Johannes Gehrke,et al.  Crowd-Blending Privacy , 2012, IACR Cryptol. ePrint Arch..

[23]  Latanya Sweeney,et al.  Achieving k-Anonymity Privacy Protection Using Generalization and Suppression , 2002, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[24]  David J. DeWitt,et al.  Mondrian Multidimensional K-Anonymity , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[25]  Jian Pei,et al.  Utility-based anonymization using local recoding , 2006, KDD '06.

[26]  Abdol Hamid Pilevar,et al.  GCHL: A grid-clustering algorithm for high-dimensional very large spatial data bases , 2005, Pattern Recognit. Lett..

[27]  Gert Vegter,et al.  In handbook of discrete and computational geometry , 1997 .

[28]  Eyal Kushilevitz,et al.  Private information retrieval , 1998, JACM.

[29]  R. J. Beynon,et al.  Computers , 1985, Comput. Appl. Biosci..

[30]  Chieh-Yuan Tsai,et al.  A k -Anonymity Clustering Method for Effective Data Privacy Preservation , 2007, ADMA.

[31]  Charu C. Aggarwal,et al.  On k-Anonymity and the Curse of Dimensionality , 2005, VLDB.

[32]  Domingo-FerrerJosep,et al.  Enhancing data utility in differential privacy via microaggregation-based k-anonymity , 2014, VLDB 2014.

[33]  Hanan Samet,et al.  Foundations of multidimensional and metric data structures , 2006, Morgan Kaufmann series in data management systems.

[34]  Elisa Bertino,et al.  Efficient k -Anonymization Using Clustering Techniques , 2007, DASFAA.

[35]  Adam Meyerson,et al.  On the complexity of optimal K-anonymity , 2004, PODS.

[36]  Vicenç Torra,et al.  Semantic Microaggregation for the Anonymization of Query Logs , 2010, Privacy in Statistical Databases.

[37]  Roberto J. Bayardo,et al.  Data privacy through optimal k-anonymization , 2005, 21st International Conference on Data Engineering (ICDE'05).

[38]  Josep Domingo-Ferrer,et al.  Enhancing data utility in differential privacy via microaggregation-based k\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{docume , 2014, The VLDB Journal.