Exclusive Strategy for Generalization Algorithms in Micro-data Disclosure

When generalization algorithms are known to the public, an adversary can obtain a more precise estimation of the secret table than what can be deduced from the disclosed generalization result. Therefore, whether a generalization algorithm can satisfy a privacy property should be judged based on such an estimation. In this paper, we show that the computation of the estimation is inherently a recursive process that exhibits a high complexity when generalization algorithms take a straightforward inclusive strategy. To facilitate the design of more efficient generalization algorithms, we suggest an alternative exclusive strategy, which adopts a seemingly drastic approach to eliminate the need for recursion. Surprisingly, the data utility of the two strategies are actually not comparable and the exclusive strategy can provide better data utility in certain cases.

[1]  David J. DeWitt,et al.  Incognito: efficient full-domain K-anonymity , 2005, SIGMOD '05.

[2]  P. Diaconis,et al.  Algebraic algorithms for sampling from conditional distributions , 1998 .

[3]  Adam Meyerson,et al.  On the complexity of optimal K-anonymity , 2004, PODS.

[4]  Dan Suciu,et al.  A formal analysis of information disclosure in data exchange , 2004, SIGMOD '04.

[5]  Roberto J. Bayardo,et al.  Data privacy through optimal k-anonymization , 2005, 21st International Conference on Data Engineering (ICDE'05).

[6]  ASHWIN MACHANAVAJJHALA,et al.  L-diversity: privacy beyond k-anonymity , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[7]  Nina Mishra,et al.  Simulatable auditing , 2005, PODS.

[8]  S. Fienberg,et al.  Bounding Entries in Multi-way Contingency Tables Given a Set of Marginal Totals , 2003 .

[9]  Sushil Jajodia,et al.  Information disclosure under realistic assumptions: privacy versus optimality , 2007, CCS '07.

[10]  Pierangela Samarati,et al.  Protecting privacy when disclosing information: k-anonymity and its enforcement through generalization and suppression , 1998 .

[11]  L. Cox Suppression Methodology and Statistical Disclosure Control , 1980 .

[12]  Ya'acov Ritov,et al.  Foundations of statistical inference : proceedings of the Shoresh Conference 2000 , 2003 .

[13]  J. Schlörer Identification and Retrieval of Personal Records from a Statistical Data Bank , 1975, Methods of Information in Medicine.

[14]  Hoeteck Wee,et al.  Toward Privacy in Public Databases , 2005, TCC.

[15]  Pierangela Samarati,et al.  Protecting Respondents' Identities in Microdata Release , 2001, IEEE Trans. Knowl. Data Eng..

[16]  Feng Zhu,et al.  On Multidimensional k-Anonymity with Local Recoding Generalization , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[17]  Nabil R. Adam,et al.  Security-control methods for statistical databases: a comparative study , 1989, CSUR.

[18]  George T. Duncan,et al.  Obtaining Information while Preserving Privacy: A Markov Perturbation Method for Tabular Data , 1997 .

[19]  Elisa Bertino,et al.  Micro-views, or on how to protect privacy while enhancing data usability: concepts and challenges , 2006, SGMD.

[20]  Stephen E. Fienberg,et al.  Bounds for Cell Entries in Two-Way Tables Given Conditional Relative Frequencies , 2004, Privacy in Statistical Databases.

[21]  S. Reiss,et al.  Data-swapping: A technique for disclosure control , 1982 .

[22]  Richard J. Lipton,et al.  Secure databases: protection against user influence , 1979, TODS.

[23]  Ivan P. Fellegi,et al.  On the Question of Statistical Confidentiality , 1972 .

[24]  Yufei Tao,et al.  Personalized privacy preservation , 2006, Privacy-Preserving Data Mining.

[25]  Francis Y. L. Chin Security problems on inference control for SUM, MAX, and MIN queries , 1986, JACM.

[26]  Latanya Sweeney,et al.  k-Anonymity: A Model for Protecting Privacy , 2002, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[27]  Jon M. Kleinberg,et al.  Auditing Boolean attributes , 2000, PODS.

[28]  Ninghui Li,et al.  t-Closeness: Privacy Beyond k-Anonymity and l-Diversity , 2007, 2007 IEEE 23rd International Conference on Data Engineering.