The l-Diversity problem: Tractability and approximability

Publishing personal data without giving up privacy is becoming an increasingly important problem in different fields. In the last years, different interesting approaches have been proposed, i.e. k-Anonymity and l-Diversity. Given an input table, these approaches partition its rows so that the computed partition satisfies some constraint, in order to prevent the inference of the individuals the data belong to. Then, the rows in a same set of the partition are related to the same rows by suppressing some of their entries. Here we focus on the l-Diversity problem, where the attributes of the input table are distinguished in sensitive attributes and quasi-identifier attributes. The goal is to partition the rows of the input table, so that each set C of the partition contains at most 1l|C| rows having a specific value in the sensitive attribute, and the number of suppressions is minimized. In this paper we investigate the approximation and parameterized complexity ofl-Diversity. First, we prove that the problem is not approximable within factor clnl, for some constant c>0, even if the input table consists of only two columns, and that the problem is APX-hard, even if l=4 and the input table contains exactly three columns. Then we give an approximation algorithm of factor m (where m+1 is the number of columns in the input table), when the sensitive attribute ranges over an alphabet of constant size. Concerning the parameterized complexity, we prove that the problem is W[1]-hard when parameterized by the cost-bound, by l, and by the size of the alphabet. Then we prove that the problem admits a fixed-parameter algorithm when both the maximum number of different values in a column and the number of columns are parameters.

[1]  Rolf Niedermeier,et al.  Invitation to Fixed-Parameter Algorithms , 2006 .

[2]  Ran Raz,et al.  A sub-constant error-probability low-degree test, and a sub-constant error-probability PCP characterization of NP , 1997, STOC '97.

[3]  Hendrik W. Lenstra,et al.  Integer Programming with a Fixed Number of Variables , 1983, Math. Oper. Res..

[4]  Riccardo Dondi,et al.  On the Complexity of the l-diversity Problem , 2011, MFCS.

[5]  Rajeev Motwani,et al.  Anonymizing Tables , 2005, ICDT.

[6]  Michael R. Fellows,et al.  Fixed-Parameter Tractability and Completeness II: On Completeness for W[1] , 1995, Theor. Comput. Sci..

[7]  Cynthia Dwork,et al.  Differential Privacy , 2006, ICALP.

[8]  Latanya Sweeney,et al.  k-Anonymity: A Model for Protecting Privacy , 2002, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[9]  Paola Bonizzoni,et al.  Parameterized complexity of k-anonymity: hardness and tractability , 2009, Journal of Combinatorial Optimization.

[10]  Kyuseok Shim,et al.  Approximate algorithms for K-anonymity , 2007, SIGMOD '07.

[11]  Giorgio Gambosi,et al.  Complexity and approximation: combinatorial optimization problems and their approximability properties , 1999 .

[12]  Todd Wareham,et al.  Fixed-parameter tractability of anonymizing data by suppressing entries , 2009, J. Comb. Optim..

[13]  Rolf Niedermeier,et al.  The Effect of Homogeneity on the Complexity of k-Anonymity , 2011, FCT.

[14]  Rolf Niedermeier,et al.  Pattern-Guided Data Anonymization and Clustering , 2011, MFCS.

[15]  Pierangela Samarati,et al.  Generalizing Data to Provide Anonymity when Disclosing Information , 1998, PODS 1998.

[16]  Philip S. Yu,et al.  Privacy-preserving data publishing: A survey of recent developments , 2010, CSUR.

[17]  Adam Meyerson,et al.  On the complexity of optimal K-anonymity , 2004, PODS.

[18]  Paola Bonizzoni,et al.  Anonymizing binary and small tables is hard to approximate , 2011, J. Comb. Optim..

[19]  Ninghui Li,et al.  t-Closeness: Privacy Beyond k-Anonymity and l-Diversity , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[20]  Vijay V. Vazirani,et al.  Approximation Algorithms , 2001, Springer Berlin Heidelberg.

[21]  Yufei Tao,et al.  The hardness and approximation algorithms for l-diversity , 2009, EDBT '10.

[22]  Ravi Kannan,et al.  Minkowski's Convex Body Theorem and Integer Programming , 1987, Math. Oper. Res..

[23]  Noga Alon,et al.  Algorithmic construction of sets for k-restrictions , 2006, TALG.

[24]  Ke Wang,et al.  On optimal anonymization for l+-diversity , 2010, 2010 IEEE 26th International Conference on Data Engineering (ICDE 2010).

[25]  Viggo Kann,et al.  Some APX-completeness results for cubic graphs , 2000, Theor. Comput. Sci..

[26]  Tamir Tassa,et al.  k-Anonymization with Minimal Loss of Information , 2009, IEEE Transactions on Knowledge and Data Engineering.

[27]  Jian Li,et al.  Clustering with Diversity , 2010, ICALP.

[28]  Ryan Williams,et al.  Resolving the Complexity of Some Data Privacy Problems , 2010, ICALP.

[29]  Ashwin Machanavajjhala,et al.  l-Diversity: Privacy Beyond k-Anonymity , 2006, ICDE.

[30]  Pierangela Samarati,et al.  Protecting Respondents' Identities in Microdata Release , 2001, IEEE Trans. Knowl. Data Eng..