Towards Identify Anonymization in Large Survey Rating Data

We study the challenge of identity protection in the large public survey rating data. Even though the survey participants do not reveal any of their ratings, their survey records are potentially identifiable by using information from other public sources. None of the existing anonymisation principles (e.g., $k$-anonymity, $l$-diversity, etc.) can effectively prevent such breaches in large survey rating data sets. In this paper, we tackle the problem by defining the $ (k, \epsilon)$-anonymity principle. The principle requires for each transaction $t$ in the given survey rating data $T$, at least $ (k-1)$ other transactions in $T$ must have ratings similar with $t$, where the similarity is controlled by $\epsilon$. We propose a greedy approach to anonymize survey rating data and apply the method to two real-life data sets to demonstrate their efficiency and practical utility.

[1]  Dino Pedreschi,et al.  Anonymity preserving pattern discovery , 2008, The VLDB Journal.

[2]  Panos Kalnis,et al.  On the Anonymization of Sparse High-Dimensional Data , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[3]  Dino Pedreschi,et al.  k-Anonymous Patterns , 2005, PKDD.

[4]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[5]  Hua Wang,et al.  Injecting purpose and trust into data anonymisation , 2009, CIKM.

[6]  John Riedl,et al.  You are what you say: privacy risks of public mentions , 2006, SIGIR '06.

[7]  Raymond Chi-Wing Wong,et al.  (α, k)-anonymity: an enhanced k-anonymity model for privacy preserving data publishing , 2006, KDD '06.

[8]  ASHWIN MACHANAVAJJHALA,et al.  L-diversity: privacy beyond k-anonymity , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[9]  Richard W. Hamming,et al.  Coding and Information Theory , 2018, Feynman Lectures on Computation.

[10]  Elisa Bertino,et al.  Association rule hiding , 2004, IEEE Transactions on Knowledge and Data Engineering.

[11]  David J. DeWitt,et al.  Incognito: efficient full-domain K-anonymity , 2005, SIGMOD '05.

[12]  Ninghui Li,et al.  t-Closeness: Privacy Beyond k-Anonymity and l-Diversity , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[13]  Charu C. Aggarwal,et al.  On k-Anonymity and the Curse of Dimensionality , 2005, VLDB.

[14]  Vitaly Shmatikov,et al.  Robust De-anonymization of Large Sparse Datasets , 2008, 2008 IEEE Symposium on Security and Privacy (sp 2008).

[15]  Daniel Kifer,et al.  Injecting utility into anonymized datasets , 2006, SIGMOD Conference.

[16]  Pierangela Samarati,et al.  Protecting Respondents' Identities in Microdata Release , 2001, IEEE Trans. Knowl. Data Eng..

[17]  Latanya Sweeney,et al.  k-Anonymity: A Model for Protecting Privacy , 2002, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[18]  Philip S. Yu,et al.  Anonymizing transaction databases for publication , 2008, KDD.

[19]  Hua Wang,et al.  Extended k-anonymity models against sensitive attribute disclosure , 2011, Comput. Commun..

[20]  Dino Pedreschi,et al.  Blocking anonymity threats raised by frequent itemset mining , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[21]  Hua Wang,et al.  Satisfying Privacy Requirements: One Step before Anonymization , 2010, PAKDD.