The effect of homogeneity on the computational complexity of combinatorial data anonymization

A matrix M is said to be k-anonymous if for each row r in M there are at least k − 1 other rows in M which are identical to r. The NP-hard k-Anonymity problem asks, given an n × m-matrix M over a fixed alphabet and an integer s > 0, whether M can be made k-anonymous by suppressing (blanking out) at most s entries. Complementing previous work, we introduce two new “data-driven” parameterizations for k-Anonymity—the number tin of different input rows and the number tout of different output rows—both modeling aspects of data homogeneity. We show that k-Anonymity is fixed-parameter tractable for the parameter tin, and that it is NP-hard even for tout = 2 and alphabet size four. Notably, our fixed-parameter tractability result implies that k-Anonymity can be solved in linear time when tin is a constant. Our computational hardness results also extend to the related privacy problems p-Sensitivity and ℓ-Diversity, while our fixed-parameter tractability results extend to p-Sensitivity and the usage of domain generalization hierarchies, where the entries are replaced by more general data instead of being completely suppressed.

[1]  Rolf Niedermeier,et al.  Invitation to Fixed-Parameter Algorithms , 2006 .

[2]  Paola Bonizzoni,et al.  Parameterized complexity of k-anonymity: hardness and tractability , 2009, Journal of Combinatorial Optimization.

[3]  Yogish Sabharwal,et al.  On the Complexity of the $k$-Anonymization Problem , 2010, ArXiv.

[4]  Yufei Tao,et al.  Anatomy: simple and effective privacy preservation , 2006, VLDB.

[5]  Jörg Flum,et al.  Parameterized Complexity Theory (Texts in Theoretical Computer Science. An EATCS Series) , 2006 .

[6]  Wu Meng Protecting Location Privacy with Personalized k-anonymity , 2012 .

[7]  Fabrizio Grandoni,et al.  Resilient dictionaries , 2009, TALG.

[8]  Riccardo Dondi,et al.  The l-Diversity problem: Tractability and approximability , 2013, Theor. Comput. Sci..

[9]  Ryan Williams,et al.  Resolving the Complexity of Some Data Privacy Problems , 2010, ICALP.

[10]  Yufei Tao,et al.  The hardness and approximation algorithms for l-diversity , 2009, EDBT '10.

[11]  Amin Milani Fard,et al.  An effective clustering approach to web query log anonymization , 2010, 2010 International Conference on Security and Cryptography (SECRYPT).

[12]  Michael R. Fellows,et al.  Towards Fully Multivariate Algorithmics: Some New Results and Directions in Parameter Ecology , 2009, IWOCA.

[13]  DworkCynthia A firm foundation for private data analysis , 2011 .

[14]  Samir Khuller,et al.  Achieving anonymity via clustering , 2006, PODS '06.

[15]  Traian Marius Truta,et al.  Protection : p-Sensitive k-Anonymity Property , 2006 .

[16]  Anna Monreale,et al.  Movement data anonymity through generalization , 2009, SPRINGL '09.

[17]  Rolf Niedermeier,et al.  Pattern-Guided Data Anonymization and Clustering , 2011, MFCS.

[18]  David S. Johnson,et al.  The NP-Completeness Column: An Ongoing Guide , 1982, J. Algorithms.

[19]  Edward Fredkin,et al.  Trie memory , 1960, Commun. ACM.

[20]  Jörg Flum,et al.  Parameterized Complexity Theory , 2006, Texts in Theoretical Computer Science. An EATCS Series.

[21]  Raymond Chi-Wing Wong,et al.  (α, k)-anonymity: an enhanced k-anonymity model for privacy preserving data publishing , 2006, KDD '06.

[22]  Pierangela Samarati,et al.  Generalizing Data to Provide Anonymity when Disclosing Information , 1998, PODS 1998.

[23]  Philip S. Yu,et al.  Privacy-preserving data publishing: A survey of recent developments , 2010, CSUR.

[24]  Jian Pei,et al.  The k-anonymity and l-diversity approaches for privacy preservation in social networks against neighborhood attacks , 2011, Knowledge and Information Systems.

[25]  Rajeev Motwani,et al.  Anonymizing Tables , 2005, ICDT.

[26]  Christian Komusiewicz,et al.  Deconstructing intractability - A multivariate complexity analysis of interval constrained coloring , 2011, J. Discrete Algorithms.

[27]  Kyuseok Shim,et al.  Approximate algorithms for K-anonymity , 2007, SIGMOD '07.

[28]  Latanya Sweeney,et al.  Achieving k-Anonymity Privacy Protection Using Generalization and Suppression , 2002, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[29]  James B. Orlin A Faster Strongly Polynomial Minimum Cost Flow Algorithm , 1993, Oper. Res..

[30]  Rolf Niedermeier,et al.  Reflections on Multivariate Algorithmics and Problem Parameterization , 2010, STACS.

[31]  Todd Wareham,et al.  Fixed-parameter tractability of anonymizing data by suppressing entries , 2009, J. Comb. Optim..

[32]  Tamir Tassa,et al.  A practical approximation algorithm for optimal k-anonymity , 2011, Data Mining and Knowledge Discovery.

[33]  Tamir Tassa,et al.  k -Anonymization with Minimal Loss of Information , 2007, ESA.

[34]  Ninghui Li,et al.  t-Closeness: Privacy Beyond k-Anonymity and l-Diversity , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[35]  Latanya Sweeney,et al.  k-Anonymity: A Model for Protecting Privacy , 2002, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[36]  Guillermo Navarro-Arribas,et al.  User k-anonymity for privacy preserving data mining of query logs , 2012, Inf. Process. Manag..

[37]  Marco Gruteser,et al.  USENIX Association , 1992 .

[38]  Alina Campan,et al.  Data and Structural k-Anonymity in Social Networks , 2009, PinKDD.

[39]  ASHWIN MACHANAVAJJHALA,et al.  L-diversity: privacy beyond k-anonymity , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[40]  Panos Kalnis,et al.  Providing K-Anonymity in location based services , 2010, SKDD.

[41]  Michael R. Fellows,et al.  Parameterized Complexity , 1998 .

[42]  Ling Liu,et al.  Protecting Location Privacy with Personalized k-Anonymity: Architecture and Algorithms , 2008, IEEE Transactions on Mobile Computing.

[43]  Adam Meyerson,et al.  On the complexity of optimal K-anonymity , 2004, PODS.

[44]  Paola Bonizzoni,et al.  Anonymizing binary and small tables is hard to approximate , 2011, J. Comb. Optim..

[45]  Pierangela Samarati,et al.  Protecting Respondents' Identities in Microdata Release , 2001, IEEE Trans. Knowl. Data Eng..