Constraint Based Subspace Clustering for High Dimensional Uncertain Data

Both uncertain data and high-dimensional data pose huge challenges to traditional clustering algorithms. It is even more challenging for clustering high dimensional uncertain data and there are few such algorithms. In this paper, based on the classical FINDIT subspace clustering algorithm for high dimensional data, we propose a constraint based semi-supervised subspace clustering algorithm for high dimensional uncertain data, UFINDIT. We extend both the distance functions and dimension voting rules of FINDIT to deal with high dimensional uncertain data. Since the soundness criteria of FINDIT fails for uncertain data, we introduce constraints to solve the problem. We also use the constraints to improve FINDIT in eliminating parameters' effect on the process of merging medoids. Furthermore, we propose some methods such as sampling to get an more efficient algorithm. Experimental results on synthetic and real data sets show that our proposed UFINDIT algorithm outperforms the existing subspace clustering algorithm for uncertain data.

[1]  Céline Robardet,et al.  Constraint-Based Subspace Clustering , 2009, SDM.

[2]  Philip S. Yu,et al.  Finding generalized projected clusters in high dimensional spaces , 2000, SIGMOD 2000.

[3]  Ian Davidson,et al.  Constrained Clustering: Advances in Algorithms, Theory, and Applications , 2008 .

[4]  Hans-Peter Kriegel,et al.  Density-Connected Subspace Clustering for High-Dimensional Data , 2004, SDM.

[5]  Tomer Hertz,et al.  Learning a Mahalanobis Metric from Equivalence Constraints , 2005, J. Mach. Learn. Res..

[6]  Anil K. Jain Data clustering: 50 years beyond K-means , 2010, Pattern Recognit. Lett..

[7]  Dimitrios Gunopulos,et al.  Automatic subspace clustering of high dimensional data for data mining applications , 1998, SIGMOD '98.

[8]  Philip S. Yu,et al.  A Survey of Uncertain Data Algorithms and Applications , 2009, IEEE Transactions on Knowledge and Data Engineering.

[9]  Kien A. Hua,et al.  Constrained locally weighted clustering , 2008, Proc. VLDB Endow..

[10]  Andrea Tagarelli,et al.  Clustering Uncertain Data Via K-Medoids , 2008, SUM.

[11]  Alok N. Choudhary,et al.  Adaptive Grids for Clustering Massive Data Sets , 2001, SDM.

[12]  Myoung-Ho Kim,et al.  FINDIT: a fast and intelligent subspace clustering algorithm using dimension voting , 2004, Inf. Softw. Technol..

[13]  Xianchao Zhang,et al.  Constraint Based Dimension Correlation and Distance Divergence for Clustering High-Dimensional Data , 2010, 2010 IEEE International Conference on Data Mining.

[14]  Reynold Cheng,et al.  Uncertain Data Mining: An Example in Clustering Location Data , 2006, PAKDD.

[15]  Thomas Seidl,et al.  Subspace Clustering for Uncertain Data , 2010, SDM.

[16]  Hans-Peter Kriegel,et al.  Hierarchical density-based clustering of uncertain data , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[17]  Hans-Peter Kriegel,et al.  Density-based clustering of uncertain data , 2005, KDD '05.

[18]  Xianchao Zhang,et al.  Novel Density-Based Clustering Algorithms for Uncertain Data , 2014, AAAI.

[19]  Yi Zhang,et al.  Entropy-based subspace clustering for mining numerical data , 1999, KDD '99.

[20]  Philip S. Yu,et al.  Fast algorithms for projected clustering , 1999, SIGMOD '99.