Privacy-Preserving Data Publishing: A Constraint-Based Clustering Approach

Privacy-preserving data publishing has drawn much research interest recently. In this chapter, we address this topic from the viewpoint of constrained clustering, i.e., the problem of finding clusters that satisfy certain user-specified constraints. More specifically, we begin with the problem of clustering under aggregate constraints (without privacy considerations) and explain how traditional algorithms for the unconstrained problem (e.g., the c-means algorithm) break down in the presence of constraints. From there, we develop scalable algorithms that overcome this problem and finally illustrate how our algorithm can also be used for privacy-preserving data publishing.

[1]  Yufei Tao,et al.  Anatomy: simple and effective privacy preservation , 2006, VLDB.

[2]  Ninghui Li,et al.  t-Closeness: Privacy Beyond k-Anonymity and l-Diversity , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[3]  Jiawei Han,et al.  Data Mining: Concepts and Techniques , 2000 .

[4]  Tian Zhang,et al.  BIRCH: an efficient data clustering method for very large databases , 1996, SIGMOD '96.

[5]  Paul S. Bradley,et al.  Scaling Clustering Algorithms to Large Databases , 1998, KDD.

[6]  Latanya Sweeney,et al.  Achieving k-Anonymity Privacy Protection Using Generalization and Suppression , 2002, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[7]  Philip S. Yu,et al.  Bottom-up generalization: a data mining solution to privacy protection , 2004, Fourth IEEE International Conference on Data Mining (ICDM'04).

[8]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[9]  Jiong Yang,et al.  STING: A Statistical Information Grid Approach to Spatial Data Mining , 1997, VLDB.

[10]  Roberto J. Bayardo,et al.  Data privacy through optimal k-anonymization , 2005, 21st International Conference on Data Engineering (ICDE'05).

[11]  Ashwin Machanavajjhala,et al.  Worst-Case Background Knowledge for Privacy-Preserving Data Publishing , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[12]  David J. DeWitt,et al.  Incognito: efficient full-domain K-anonymity , 2005, SIGMOD '05.

[13]  Vipin Kumar,et al.  Chameleon: Hierarchical Clustering Using Dynamic Modeling , 1999, Computer.

[14]  Latanya Sweeney,et al.  k-Anonymity: A Model for Protecting Privacy , 2002, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[15]  David R. Karger,et al.  On approximating the longest path in a graph , 1997, Algorithmica.

[16]  Johannes Gehrke,et al.  Mining Very Large Databases , 1999, Computer.

[17]  Ashwin Machanavajjhala,et al.  l-Diversity: Privacy Beyond k-Anonymity , 2006, ICDE.

[18]  Yufei Tao,et al.  Personalized privacy preservation , 2006, Privacy-Preserving Data Mining.

[19]  Ali S. Hadi,et al.  Finding Groups in Data: An Introduction to Chster Analysis , 1991 .

[20]  Anthony K. H. Tung,et al.  Constraint-based clustering in large databases , 2001, ICDT.