K-medoids method based on divergence for uncertain data clustering

Uncertain data clustering is an essential task in the research of data mining. Lots of traditional clustering methods are extended with new similarity measurements to tackle this issue. Different from certain data clustering, uncertain data clustering focus more on the evaluation of distribution similarity between uncertain data objects. In this paper, based on the KL-divergence and the JS-divergence, we propose a novel K-medoids method for clustering uncertain data, named UK-medoids. Good performance of the proposed algorithm is shown in experiments on synthetic datasets.

[1]  Jianhua Lin,et al.  Divergence measures based on the Shannon entropy , 1991, IEEE Trans. Inf. Theory.

[2]  Bin Jiang,et al.  Probabilistic skylines on uncertain data: model and bounding-pruning-refining methods , 2010, Journal of Intelligent Information Systems.

[3]  Bin Jiang,et al.  Clustering Uncertain Data Based on Probability Distribution Similarity , 2013, IEEE Transactions on Knowledge and Data Engineering.

[4]  C. L. Philip Chen,et al.  A Collaborative Fuzzy Clustering Algorithm in Distributed Network Environments , 2014, IEEE Transactions on Fuzzy Systems.

[5]  C. D. Kemp,et al.  Density Estimation for Statistics and Data Analysis , 1987 .

[6]  Yuan Zhang,et al.  Fuzzy clustering with the entropy of attribute weights , 2016, Neurocomputing.

[7]  Jennifer Widom,et al.  Representing uncertain data: models, properties, and algorithms , 2009, The VLDB Journal.

[8]  Hans-Peter Kriegel,et al.  Density-based clustering of uncertain data , 2005, KDD '05.

[9]  C. L. Philip Chen,et al.  Feature sequencing in the rapid design system using a genetic algorithm , 1996, J. Intell. Manuf..

[10]  Charu C. Aggarwal,et al.  Data Clustering: Algorithms and Applications , 2014 .

[11]  Rui Xu,et al.  Survey of clustering algorithms , 2005, IEEE Transactions on Neural Networks.

[12]  Yung-Yu Chuang,et al.  Multiple Kernel Fuzzy Clustering , 2012, IEEE Transactions on Fuzzy Systems.

[13]  D. W. Scott,et al.  Multivariate Density Estimation, Theory, Practice and Visualization , 1992 .

[14]  Reynold Cheng,et al.  Uncertain Data Mining: An Example in Clustering Location Data , 2006, PAKDD.

[15]  Reynold Cheng,et al.  Efficient Clustering of Uncertain Data , 2006, Sixth International Conference on Data Mining (ICDM'06).

[16]  B. Silverman Density estimation for statistics and data analysis , 1986 .

[17]  Witold Pedrycz,et al.  Kernel-based fuzzy clustering and fuzzy clustering: A comparative experimental study , 2010, Fuzzy Sets Syst..

[18]  Ali S. Hadi,et al.  Finding Groups in Data: An Introduction to Chster Analysis , 1991 .

[19]  C. L. Philip Chen,et al.  Regularization parameter estimation for feedforward neural networks , 2003 .

[20]  David K. Wright,et al.  Freehand drawing system using a fuzzy logic concept , 1999, Comput. Aided Des..

[21]  N. Campbell,et al.  Scientific Inference , 1931, Nature.