论文信息 - Towards publishing set-valued data with high utility

Towards publishing set-valued data with high utility

Set-valued data are common in databases which usually contain sensitive information that is associated with data owners. Publishing set-valued data may lead to identity breaches. Pioneering techniques de-identify data by k-anonymity which may produce anonymized data of low utility. K-anonymity must be carried out based on the assumption that a presetting taxonomy tree exists. In this paper, we investigate the negative influence of taxonomy tree on data utility, and propose a novel method to anonymize data in a utility-preserving manner. We artificially construct a pseudo taxonomy tree based on utility metrics. Experiments show that our construct-then-anonymize method is not only available for anonymizing set-valued data, but also provides considerable improvement on data utility.

廖明宏 | Minghong Liao | Sinhong Lin | Minghong Liao | S. Lin

[1] Charu C. Aggarwal,et al. On k-Anonymity and the Curse of Dimensionality , 2005, VLDB.

[2] Benjamin C. M. Fung,et al. Centralized and Distributed Anonymization for High-Dimensional Healthcare Data , 2010, TKDD.

[3] Jian Pei,et al. Utility-based anonymization using local recoding , 2006, KDD '06.

[4] Jiawei Han,et al. Discovery of Multiple-Level Association Rules from Large Databases , 1995, VLDB.

[5] Panos Kalnis,et al. On the Anonymization of Sparse High-Dimensional Data , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[6] Aris Gkoulalas-Divanis,et al. Utility-guided Clustering-based Transaction Data Anonymization , 2012, Trans. Data Priv..

[7] Aris Gkoulalas-Divanis,et al. Efficient and flexible anonymization of transaction data , 2012, Knowledge and Information Systems.

[8] Panos Kalnis,et al. Local and global recoding methods for anonymizing set-valued data , 2010, The VLDB Journal.

[9] Chris Clifton,et al. Thoughts on k-Anonymization , 2006, 22nd International Conference on Data Engineering Workshops (ICDEW'06).

[10] B. K. Tripathy,et al. Improved Algorithms for Anonymization of Set-Valued Data , 2012, ACITY.

[11] Jian Pei,et al. Publishing Sensitive Transactions for Itemset Utility , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[12] Latanya Sweeney,et al. k-Anonymity: A Model for Protecting Privacy , 2002, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[13] Panos Kalnis,et al. Privacy-preserving anonymization of set-valued data , 2008, Proc. VLDB Endow..

[14] Jiawei Han,et al. Mining Multiple-Level Association Rules in Large Databases , 1999, IEEE Trans. Knowl. Data Eng..

[15] Aris Gkoulalas-Divanis,et al. PCTA: privacy-constrained clustering-based transaction data anonymization , 2011, PAIS '11.

[16] Jeffrey F. Naughton,et al. Anonymization of Set-Valued Data via Top-Down, Local Generalization , 2009, Proc. VLDB Endow..

[17] Bradley Malin,et al. COAT: COnstraint-based anonymization of transactions , 2010, Knowledge and Information Systems.

[18] Wendy Hui Wang,et al. Towards publishing recommendation data with predictive anonymization , 2010, ASIACCS '10.

[19] Ying Xu,et al. Clustering gene expression data using a graph-theoretic approach: an application of minimum spanning trees , 2002, Bioinform..

[20] Charles T. Zahn,et al. Graph-Theoretical Methods for Detecting and Describing Gestalt Clusters , 1971, IEEE Transactions on Computers.

[21] Philip S. Yu,et al. Anonymizing transaction databases for publication , 2008, KDD.

[22] B. Malin,et al. Anonymization of electronic medical records for validating genome-wide association studies , 2010, Proceedings of the National Academy of Sciences.

[23] J. MacQueen. Some methods for classification and analysis of multivariate observations , 1967 .

[24] Danfeng Yao,et al. The union-split algorithm and cluster-based anonymization of social networks , 2009, ASIACCS '09.

[25] Benjamin C. M. Fung,et al. Anonymizing healthcare data: a case study on the blood transfusion service , 2009, KDD.