论文信息 - A subspace filter supporting the discovery of small clusters in very noisy datasets

A subspace filter supporting the discovery of small clusters in very noisy datasets

Feature selection becomes crucial when exploring high-dimensional datasets via clustering, because it is unlikely that the data groups jointly in all dimensions but clustering algorithms treat all attributes equally. A new subspace filter approach is presented that is capable of coping with the difficult situation of finding small clusters embedded in a very noisy environment (more noise than clustering data), which is not mislead by dense, high-dimensional spots caused by density fluctuations of single attributes. Experimental evaluation on artificial and real datasets demonstrate good performance and high efficiency.

Frank Höppner | F. Höppner

[1] Mohammed J. Zaki,et al. SCHISM: a new approach for interesting subspace mining , 2004, Fourth IEEE International Conference on Data Mining (ICDM'04).

[2] Arthur Zimek,et al. A survey on enhanced subspace clustering , 2013, Data Mining and Knowledge Discovery.

[3] Arthur Zimek,et al. Clustering High-Dimensional Data , 2018, Data Clustering: Algorithms and Applications.

[4] Huan Liu,et al. Feature selection for clustering - a filter solution , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[5] Ira Assent,et al. Clustering high dimensional data , 2012 .

[6] Hans-Peter Kriegel,et al. Ranking Interesting Subspaces for Clustering High Dimensional Data , 2003, PKDD.

[7] Erdal Panayirci,et al. A test for multidimensional clustering tendency , 1983, Pattern Recognit..

[8] Hans-Peter Kriegel,et al. A generic framework for efficient subspace clustering of high-dimensional data , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[9] Ira Assent,et al. DensEst: Density Estimation for Data Mining in High Dimensional Spaces , 2009, SDM.

[10] William H. Press,et al. Numerical recipes , 1990 .

[11] Hans-Peter Kriegel,et al. Clustering high-dimensional data: A survey on subspace clustering, pattern-based clustering, and correlation clustering , 2009, TKDD.

[12] Huan Liu,et al. Subspace clustering for high dimensional data: a review , 2004, SKDD.

[13] Jing Hua,et al. Localized feature selection for clustering , 2008, Pattern Recognit. Lett..

[14] Klemens Böhm,et al. HiCS: High Contrast Subspaces for Density-Based Outlier Ranking , 2012, 2012 IEEE 28th International Conference on Data Engineering.

[15] J. G. Skellam,et al. A New Method for determining the Type of Distribution of Plant Individuals , 1954 .

[16] Huan Liu,et al. Toward integrating feature selection algorithms for classification and clustering , 2005, IEEE Transactions on Knowledge and Data Engineering.

[17] Carla E. Brodley,et al. Feature Selection for Unsupervised Learning , 2004, J. Mach. Learn. Res..

[18] Manoranjan Dash,et al. Feature Selection for Clustering , 2009, Encyclopedia of Database Systems.

[19] Hans-Peter Kriegel,et al. Subspace selection for clustering high-dimensional data , 2004, Fourth IEEE International Conference on Data Mining (ICDM'04).

[20] Ira Assent,et al. DUSC: Dimensionality Unbiased Subspace Clustering , 2007, Seventh IEEE International Conference on Data Mining (ICDM 2007).

[21] Jörg Sander,et al. Finding non-redundant, statistically significant regions in high dimensional data: a novel approach to projected and subspace clustering , 2008, KDD.

[22] Ira Assent,et al. Evaluating Clustering in Subspace Projections of High Dimensional Data , 2009, Proc. VLDB Endow..

[23] Yi Zhang,et al. Entropy-based subspace clustering for mining numerical data , 1999, KDD '99.