Information disclosure under realistic assumptions: privacy versus optimality

The problem of information disclosure has attracted much interest from the research community in recent years. When disclosing information, the challenge is to provide as much information as possible (optimality) while guaranteeing a desired safety property for privacy (such as l-diversity). A typical disclosure algorithm uses a sequence of disclosure schemas to output generalizations in the nonincreasing order of data utility; the algorithm releases the first generalization that satisfies the safety property. In this paper, we assert that the desired safety property cannot always be guaranteed if an adversary has the knowledge of the underlying disclosure algorithm. We propose a model for the additional information disclosed by an algorithm based on the definition of deterministic disclosure function (DDF), and provide definitions of p-safe and p-optimal DDFs. We give an analysis for the complexity to compute a p-optimal DDF. We show that deciding whether a DDF is p-optimal is an NP-hard problem, and only under specific conditions, we can solve the problem in polynomial time with respect to the size of the set of all possible database instances and the length of the disclosure generalization sequence. We then consider the problem of microdata disclosure and the safety condition of l-diversity. We relax the notion of p-optimality to weak p-optimality, and develop a weak p-optimal algorithm which is polynomial in the size of the original table and the length of the generalization sequence.

[1]  Yufei Tao,et al.  Personalized privacy preservation , 2006, Privacy-Preserving Data Mining.

[2]  Francis Y. L. Chin,et al.  Security problems on inference control for SUM, MAX, and MIN queries , 1986, JACM.

[3]  ASHWIN MACHANAVAJJHALA,et al.  L-diversity: privacy beyond k-anonymity , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[4]  Dan Suciu,et al.  A formal analysis of information disclosure in data exchange , 2004, SIGMOD '04.

[5]  Hoeteck Wee,et al.  Toward Privacy in Public Databases , 2005, TCC.

[6]  Latanya Sweeney,et al.  k-Anonymity: A Model for Protecting Privacy , 2002, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[7]  George T. Duncan,et al.  Obtaining Information while Preserving Privacy: A Markov Perturbation Method for Tabular Data , 1997 .

[8]  Ivan P. Fellegi,et al.  On the Question of Statistical Confidentiality , 1972 .

[9]  Sushil Jajodia,et al.  Secure Data Management in Decentralized Systems , 2014, Secure Data Management in Decentralized Systems.

[10]  Ashwin Machanavajjhala,et al.  l-Diversity: Privacy Beyond k-Anonymity , 2006, ICDE.

[11]  Pierangela Samarati,et al.  Protecting Respondents' Identities in Microdata Release , 2001, IEEE Trans. Knowl. Data Eng..

[12]  Jon M. Kleinberg,et al.  Auditing Boolean attributes , 2000, PODS.

[13]  Nabil R. Adam,et al.  Security-control methods for statistical databases: a comparative study , 1989, CSUR.

[14]  Stephen E. Fienberg,et al.  Bounds for Cell Entries in Two-Way Tables Given Conditional Relative Frequencies , 2004, Privacy in Statistical Databases.

[15]  Adam Meyerson,et al.  On the complexity of optimal K-anonymity , 2004, PODS.

[16]  Daniel Kifer,et al.  Injecting utility into anonymized datasets , 2006, SIGMOD Conference.

[17]  S. Reiss,et al.  Data-swapping: A technique for disclosure control , 1982 .

[18]  Roberto J. Bayardo,et al.  Data privacy through optimal k-anonymization , 2005, 21st International Conference on Data Engineering (ICDE'05).

[19]  P. Diaconis,et al.  Algebraic algorithms for sampling from conditional distributions , 1998 .

[20]  Richard J. Lipton,et al.  Secure databases: protection against user influence , 1979, TODS.

[21]  Nina Mishra,et al.  Simulatable auditing , 2005, PODS.

[22]  S. Fienberg,et al.  Bounding Entries in Multi-way Contingency Tables Given a Set of Marginal Totals , 2003 .

[23]  Pierangela Samarati,et al.  Protecting privacy when disclosing information: k-anonymity and its enforcement through generalization and suppression , 1998 .

[24]  L. Cox Suppression Methodology and Statistical Disclosure Control , 1980 .

[25]  David J. DeWitt,et al.  Incognito: efficient full-domain K-anonymity , 2005, SIGMOD '05.

[26]  J. Schlörer Identification and Retrieval of Personal Records from a Statistical Data Bank , 1975, Methods of Information in Medicine.