论文信息 - Small domain randomization

Small domain randomization

Random perturbation is a promising technique for privacy preserving data mining. It retains an original sensitive value with a certain probability and replaces it with a random value from the domain with the remaining probability. If the replacing value is chosen from a large domain, the retention probability must be small to protect privacy. For this reason, previous randomization-based approaches have poor utility. In this paper, we propose an alternative way to randomize sensitive values, called small domain randomization. First, we partition the given table into sub-tables that have smaller domains of sensitive values. Then, we randomize the sensitive values within each sub-table independently. Since each sub-table has a smaller domain, a larger retention probability is permitted. We propose this approach as an alternative to classical partition-based approaches to privacy preserving data publishing. There are two key issues: ensure the published sub-tables do not disclose more private information than what is permitted on the original table, and partition the table so that utility is maximized. We present an effective solution.

Ke Wang | Rhonda Baldwin

[1] Jennifer A. Scott,et al. Reducing the Total Bandwidth of a Sparse Unsymmetric Matrix , 2006, SIAM J. Matrix Anal. Appl..

[2] Jayant R. Haritsa,et al. A Framework for High-Accuracy Privacy-Preserving Mining , 2005, ICDE.

[3] Alexandre V. Evfimievski,et al. Limiting privacy breaches in privacy preserving data mining , 2003, PODS.

[4] E. Cuthill,et al. Reducing the bandwidth of sparse symmetric matrices , 1969, ACM '69.

[5] Frank McSherry,et al. Privacy integrated queries: an extensible platform for privacy-preserving data analysis , 2009, SIGMOD Conference.

[6] Panos Kalnis,et al. On the Anonymization of Sparse High-Dimensional Data , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[7] Wenliang Du,et al. OptRR: Optimizing Randomized Response Schemes for Privacy-Preserving Data Mining , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[8] Minghua Chen,et al. Optimal Random Perturbation at Multiple Privacy Levels , 2009, Proc. VLDB Endow..

[9] Wenliang Du,et al. Using randomized response techniques for privacy-preserving data mining , 2003, KDD '03.

[10] S L Warner,et al. Randomized response: a survey technique for eliminating evasive answer bias. , 1965, Journal of the American Statistical Association.

[11] ASHWIN MACHANAVAJJHALA,et al. L-diversity: privacy beyond k-anonymity , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[12] Yufei Tao,et al. On Anti-Corruption Privacy Preserving Publication , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[13] Panos Kalnis,et al. Fast Data Anonymization with Low Information Loss , 2007, VLDB.

[14] Latanya Sweeney,et al. Achieving k-Anonymity Privacy Protection Using Generalization and Suppression , 2002, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[15] Qing Zhang,et al. Aggregate Query Answering on Anonymized Tables , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[16] Yehuda Lindell,et al. Privacy Preserving Data Mining , 2002, Journal of Cryptology.

[17] Ramakrishnan Srikant,et al. Privacy preserving OLAP , 2005, SIGMOD '05.

[18] Pierangela Samarati,et al. Protecting Respondents' Identities in Microdata Release , 2001, IEEE Trans. Knowl. Data Eng..

[19] Dan Suciu,et al. The Boundary Between Privacy and Utility in Data Publishing , 2007, VLDB.

[20] H. Chernoff. A Measure of Asymptotic Efficiency for Tests of a Hypothesis Based on the sum of Observations , 1952 .

[21] Yufei Tao,et al. Anatomy: simple and effective privacy preservation , 2006, VLDB.

[22] Ninghui Li,et al. Modeling and Integrating Background Knowledge in Data Anonymization , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[23] Philip S. Yu,et al. A Condensation Approach to Privacy Preserving Data Mining , 2004, EDBT.

[24] Raghu Ramakrishnan,et al. Privacy Skyline: Privacy with Multidimensional Adversarial Knowledge , 2007, VLDB.