Latent Dirichlet Conditional Naive-Bayes Models for Privacy-Preservation Clustering

The paper introduces a model for privacy preservation clustering which can handle the problems of privacy preservation,distributed computing. First, the latent variables in Latent Dirichlet Conditional Naive-Bayes Models(LDCNB)are redefined and some terminologies are defined.Second, Variational approximation inference for LD-CNBis stated in detail. Third, base on the variational approximation inference, we design a distributed EM algorithm for privacy preservation clustering. Finally, some datasets from UCI are chosen for experiment, Compared with the distributed k-means algorithm, the results show LD-CNB algorithm does work better and LD-CNB can work distributed,so LD-CNB can protect privacy information.

[1]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[2]  Ana L. N. Fred,et al.  Data clustering using evidence accumulation , 2002, Object recognition supported by user interaction for service robots.

[3]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[4]  Ramakrishnan Srikant,et al.  Privacy-preserving data mining , 2000, SIGMOD '00.

[5]  S. P. Lloyd,et al.  Least squares quantization in PCM , 1982, IEEE Trans. Inf. Theory.

[6]  Wei Tang,et al.  Clusterer ensemble , 2006, Knowl. Based Syst..

[7]  Chris Clifton,et al.  Privacy-Preserving Kth Element Score over Vertically Partitioned Data , 2009, IEEE Transactions on Knowledge and Data Engineering.

[8]  Tian Zhang,et al.  BIRCH: A New Data Clustering Algorithm and Its Applications , 1997, Data Mining and Knowledge Discovery.

[9]  A. Yao,et al.  Fair exchange with a semi-trusted third party (extended abstract) , 1997, CCS '97.

[10]  Sudipto Guha,et al.  Clustering Data Streams: Theory and Practice , 2003, IEEE Trans. Knowl. Data Eng..

[11]  Somesh Jha,et al.  Privacy Preserving Clustering , 2005, ESORICS.

[12]  Arindam Banerjee,et al.  Latent Dirichlet Conditional Naive-Bayes Models , 2007, Seventh IEEE International Conference on Data Mining (ICDM 2007).

[13]  J Kumar,et al.  Privacy Preserving Clustering In Data Mining , 2010 .

[14]  Geoffrey E. Hinton,et al.  A View of the Em Algorithm that Justifies Incremental, Sparse, and other Variants , 1998, Learning in Graphical Models.

[15]  Rebecca N. Wright,et al.  Privacy-preserving distributed k-means clustering over arbitrarily partitioned data , 2005, KDD '05.

[16]  Chris Clifton,et al.  Privacy-preserving k-means clustering over vertically partitioned data , 2003, KDD '03.