论文信息 - Clustering with Diversity

Clustering with Diversity

We consider the clustering with diversity problem: given a set of colored points in a metric space, partition them into clusters such that each cluster has at least l points, all of which have distinct colors. We give a 2-approximation to this problem for any l when the objective is to minimize the maximum radius of any cluster. We show that the approximation ratio is optimal unless P = NP, by providing a matching lower bound. Several extensions to our algorithm have also been developed for handling outliers. This problem is mainly motivated by applications in privacy-preserving data publication.

[1] Panos Kalnis,et al. Fast Data Anonymization with Low Information Loss , 2007, VLDB.

[2] David S. Johnson,et al. Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[3] Adam Meyerson,et al. On the complexity of optimal K-anonymity , 2004, PODS.

[4] Claire Cardie,et al. Clustering with Instance-Level Constraints , 2000, AAAI/IAAI.

[5] Raymond Chi-Wing Wong,et al. (α, k)-anonymity: an enhanced k-anonymity model for privacy preserving data publishing , 2006, KDD '06.

[6] Christopher Ré,et al. Large-Scale Deduplication with Constraints Using Dedupalog , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[7] John E. Mitchell,et al. GRAPH PARTITION PROBLEMS WITH MINIMUM SIZE CONSTRAINTS , 2004 .

[8] S. S. Ravi,et al. Intractability and clustering with constraints , 2007, ICML '07.

[9] Samir Khuller,et al. Algorithms for facility location problems with outliers , 2001, SODA '01.

[10] David J. DeWitt,et al. Mondrian Multidimensional K-Anonymity , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[11] Kyuseok Shim,et al. Approximate algorithms for K-anonymity , 2007, SIGMOD '07.

[12] Muhammad H. Alsuwaiyel,et al. Algorithms - Design Techniques and Analysis , 1999, Lecture Notes Series on Computing.

[13] Yufei Tao,et al. Anatomy: simple and effective privacy preservation , 2006, VLDB.

[14] Frank Stajano,et al. Location Privacy in Pervasive Computing , 2003, IEEE Pervasive Comput..

[15] Moni Naor,et al. On the complexity of differentially private data release: efficient algorithms and hardness results , 2009, STOC '09.

[16] ASHWIN MACHANAVAJJHALA,et al. L-diversity: privacy beyond k-anonymity , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[17] Ashwin Machanavajjhala,et al. l-Diversity: Privacy Beyond k-Anonymity , 2006, ICDE.

[18] Pierangela Samarati,et al. Protecting Respondents' Identities in Microdata Release , 2001, IEEE Trans. Knowl. Data Eng..

[19] David S. Johnson,et al. Computers and In stractability: A Guide to the Theory of NP-Completeness. W. H Freeman, San Fran , 1979 .

[20] Nir Ailon,et al. Aggregating inconsistent information: Ranking and clustering , 2008 .

[21] Frank Klawonn,et al. Clustering with Size Constraints , 2008, Computational Intelligence Paradigms.

[22] 睦憲柳浦,et al. Combinatorial Optimization : Theory and Algorithms (3rd Edition), B. Korte and J. Vygen 著, 出版社 Springer, 発行 2006年, 全ページ 597頁, 価格 53.45ユーロ, ISBN 3-540-25684-9 , 2006 .

[23] Avrim Blum,et al. Correlation Clustering , 2004, Machine Learning.

[24] V. Guruswami,et al. Correlation clustering with a fixed number of clusters , 2006, SODA 2006.

[25] Daniel Kifer,et al. Injecting utility into anonymized datasets , 2006, SIGMOD Conference.

[26] Yufei Tao,et al. The hardness and approximation algorithms for l-diversity , 2009, EDBT '10.

[27] Michael I. Jordan,et al. Distance Metric Learning with Application to Clustering with Side-Information , 2002, NIPS.

[28] Rajeev Motwani,et al. Anonymizing Tables , 2005, ICDT.

[29] Yufei Tao,et al. M-invariance: towards privacy preserving re-publication of dynamic datasets , 2007, SIGMOD '07.

[30] Claire Cardie,et al. Proceedings of the Eighteenth International Conference on Machine Learning, 2001, p. 577–584. Constrained K-means Clustering with Background Knowledge , 2022 .

[31] Sergey Yekhanin,et al. Towards 3-query locally decodable codes of subexponential length , 2008, JACM.

[32] Nikhil Bansal,et al. Correlation Clustering , 2002, The 43rd Annual IEEE Symposium on Foundations of Computer Science, 2002. Proceedings..

[33] Jens Vygen,et al. The Book Review Column1 , 2020, SIGACT News.

[34] Haim Kaplan,et al. Private coresets , 2009, STOC '09.

[35] Rolf Apweiler,et al. The SWISS-PROT protein sequence data bank and its supplement TrEMBL , 1997, Nucleic Acids Res..

[36] Samir Khuller,et al. Achieving anonymity via clustering , 2006, PODS '06.