Small domain randomization

Random perturbation is a promising technique for privacy preserving data mining. It retains an original sensitive value with a certain probability and replaces it with a random value from the domain with the remaining probability. If the replacing value is chosen from a large domain, the retention probability must be small to protect privacy. For this reason, previous randomization-based approaches have poor utility. In this paper, we propose an alternative way to randomize sensitive values, called small domain randomization. First, we partition the given table into sub-tables that have smaller domains of sensitive values. Then, we randomize the sensitive values within each sub-table independently. Since each sub-table has a smaller domain, a larger retention probability is permitted. We propose this approach as an alternative to classical partition-based approaches to privacy preserving data publishing. There are two key issues: ensure the published sub-tables do not disclose more private information than what is permitted on the original table, and partition the table so that utility is maximized. We present an effective solution.

[1]  Jennifer A. Scott,et al.  Reducing the Total Bandwidth of a Sparse Unsymmetric Matrix , 2006, SIAM J. Matrix Anal. Appl..

[2]  Jayant R. Haritsa,et al.  A Framework for High-Accuracy Privacy-Preserving Mining , 2005, ICDE.

[3]  Alexandre V. Evfimievski,et al.  Limiting privacy breaches in privacy preserving data mining , 2003, PODS.

[4]  E. Cuthill,et al.  Reducing the bandwidth of sparse symmetric matrices , 1969, ACM '69.

[5]  Frank McSherry,et al.  Privacy integrated queries: an extensible platform for privacy-preserving data analysis , 2009, SIGMOD Conference.

[6]  Panos Kalnis,et al.  On the Anonymization of Sparse High-Dimensional Data , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[7]  Wenliang Du,et al.  OptRR: Optimizing Randomized Response Schemes for Privacy-Preserving Data Mining , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[8]  Minghua Chen,et al.  Optimal Random Perturbation at Multiple Privacy Levels , 2009, Proc. VLDB Endow..

[9]  Wenliang Du,et al.  Using randomized response techniques for privacy-preserving data mining , 2003, KDD '03.

[10]  S L Warner,et al.  Randomized response: a survey technique for eliminating evasive answer bias. , 1965, Journal of the American Statistical Association.

[11]  ASHWIN MACHANAVAJJHALA,et al.  L-diversity: privacy beyond k-anonymity , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[12]  Yufei Tao,et al.  On Anti-Corruption Privacy Preserving Publication , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[13]  Panos Kalnis,et al.  Fast Data Anonymization with Low Information Loss , 2007, VLDB.

[14]  Latanya Sweeney,et al.  Achieving k-Anonymity Privacy Protection Using Generalization and Suppression , 2002, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[15]  Qing Zhang,et al.  Aggregate Query Answering on Anonymized Tables , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[16]  Yehuda Lindell,et al.  Privacy Preserving Data Mining , 2002, Journal of Cryptology.

[17]  Ramakrishnan Srikant,et al.  Privacy preserving OLAP , 2005, SIGMOD '05.

[18]  Pierangela Samarati,et al.  Protecting Respondents' Identities in Microdata Release , 2001, IEEE Trans. Knowl. Data Eng..

[19]  Dan Suciu,et al.  The Boundary Between Privacy and Utility in Data Publishing , 2007, VLDB.

[20]  H. Chernoff A Measure of Asymptotic Efficiency for Tests of a Hypothesis Based on the sum of Observations , 1952 .

[21]  Yufei Tao,et al.  Anatomy: simple and effective privacy preservation , 2006, VLDB.

[22]  Ninghui Li,et al.  Modeling and Integrating Background Knowledge in Data Anonymization , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[23]  Philip S. Yu,et al.  A Condensation Approach to Privacy Preserving Data Mining , 2004, EDBT.

[24]  Raghu Ramakrishnan,et al.  Privacy Skyline: Privacy with Multidimensional Adversarial Knowledge , 2007, VLDB.