Privacy and Utility Effects of k-anonymity on Association Rule Hiding

In recent years, privacy preservation has attracted much interest due to concerns regarding breaches of privacy when data are published and analyzed. Private information can be observed directly from published data or inferred through data mining techniques. The k-anonymity concept was first proposed to hide sensitive attribute values that could be discovered using a linking attack. Association rule hiding techniques have been proposed to hide sensitive patterns in mining results. However, these association rule hiding techniques have side effects such as hiding failure, creation of new rules, and lost rules. In addition, the k-anonymity approach does not consider hiding association rules. In this work, we extend the k-anonymity concept to hide sensitive association rules and compare it with the association rule hiding approach. We propose a novel concept of measuring privacy gain and utility loss of anonymized association rules. Numerical experiments comparing the two approaches show that the k-anonymity for association rule mining approach achieves higher privacy gain, while the direct anonymization approach of association rule hiding achieves lower utility loss. The results obtained here provide a guideline for adopting anonymization techniques under different requirements and suggests a direction for the development of new association rule hiding techniques.

[1]  Tzung-Pei Hong,et al.  Using TF-IDF to hide sensitive itemsets , 2012, Applied Intelligence.

[2]  Adam Meyerson,et al.  On the complexity of optimal K-anonymity , 2004, PODS.

[3]  Yehuda Lindell,et al.  Privacy Preserving Data Mining , 2002, Journal of Cryptology.

[4]  Tzung-Pei Hong,et al.  On anonymizing transactions with sensitive items , 2014, Applied Intelligence.

[5]  Ashwin Machanavajjhala,et al.  l-Diversity: Privacy Beyond k-Anonymity , 2006, ICDE.

[6]  Jeffrey F. Naughton,et al.  Anonymization of Set-Valued Data via Top-Down, Local Generalization , 2009, Proc. VLDB Endow..

[7]  Pierangela Samarati,et al.  Protecting Respondents' Identities in Microdata Release , 2001, IEEE Trans. Knowl. Data Eng..

[8]  Latanya Sweeney,et al.  k-Anonymity: A Model for Protecting Privacy , 2002, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[9]  Tzung-Pei Hong,et al.  Hiding collaborative recommendation association rules , 2007, Applied Intelligence.

[10]  Ninghui Li,et al.  t-Closeness: Privacy Beyond k-Anonymity and l-Diversity , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[11]  Das Amrita,et al.  Mining Association Rules between Sets of Items in Large Databases , 2013 .

[12]  Elisa Bertino,et al.  Association rule hiding , 2004, IEEE Transactions on Knowledge and Data Engineering.

[13]  Philip S. Yu,et al.  Privacy-Preserving Data Mining - Models and Algorithms , 2008, Advances in Database Systems.

[14]  Panos Kalnis,et al.  Anonymous Publication of Sensitive Transactional Data , 2011, IEEE Transactions on Knowledge and Data Engineering.

[15]  Reihaneh Safavi-Naini,et al.  A practice-oriented framework for measuring privacy and utility in data sanitization systems , 2010, EDBT '10.

[16]  Samir Khuller,et al.  Achieving anonymity via clustering , 2006, PODS '06.

[17]  Rajeev Motwani,et al.  Anonymizing Unstructured Data , 2008, ArXiv.

[18]  Kyuseok Shim,et al.  Approximate algorithms for K-anonymity , 2007, SIGMOD '07.

[19]  Ran Wolff,et al.  The VLDB Journal manuscript No. (will be inserted by the editor) Providing k-Anonymity in Data Mining , 2022 .

[20]  Latanya Sweeney,et al.  Achieving k-Anonymity Privacy Protection Using Generalization and Suppression , 2002, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[21]  Pierangela Samarati,et al.  Generalizing Data to Provide Anonymity when Disclosing Information , 1998, PODS 1998.

[22]  Philip S. Yu,et al.  Privacy-preserving data publishing: A survey of recent developments , 2010, CSUR.

[23]  Raymond Heatherly,et al.  Process-Driven Data Privacy , 2015, CIKM.