论文信息 - A Soft Subspace Clustering Method for Text Data Using a Probability Based Feature Weighting Scheme

A Soft Subspace Clustering Method for Text Data Using a Probability Based Feature Weighting Scheme

Clustering methods aim to find clusters or groups of similar objects in a given set of data. Common soft subspace clustering methods for text data find different clusters in subspaces using a weighted distance measure. The weighting scheme heavily affects the clustering performance and requires special consideration. Since text data has semantic information along with syntactic information, a weighting scheme, which uses semantic information, is more likely to generate a better clustering solution.

[1] Lise Getoor,et al. A Latent Dirichlet Model for Unsupervised Entity Resolution , 2005, SDM.

[2] F. Wilcoxon. Individual Comparisons by Ranking Methods , 1945 .

[3] Xiaoying Gao,et al. Multi-objective multi-view clustering ensemble based on evolutionary approach , 2015, 2015 IEEE Congress on Evolutionary Computation (CEC).

[4] Myoung-Ho Kim,et al. FINDIT: a fast and intelligent subspace clustering algorithm using dimension voting , 2004, Inf. Softw. Technol..

[5] Philip S. Yu,et al. /spl delta/-clusters: capturing subspace correlation in a large data set , 2002, Proceedings 18th International Conference on Data Engineering.

[6] J. Friedman,et al. Clustering objects on subsets of attributes (with discussion) , 2004 .

[7] J. Carroll,et al. Synthesized clustering: A method for amalgamating alternative clustering bases with differential weighting of variables , 1984 .

[8] Michael K. Ng,et al. An optimization algorithm for clustering using weighted dissimilarity measures , 2004, Pattern Recognit..

[9] W. Scott Spangler,et al. Feature Weighting in k-Means Clustering , 2003, Machine Learning.

[10] C. Elkan,et al. Topic Models , 2008 .

[11] Michael K. Ng,et al. An Entropy Weighting k-Means Algorithm for Subspace Clustering of High-Dimensional Sparse Data , 2007, IEEE Transactions on Knowledge and Data Engineering.

[12] G. Soete. OVWTRE: A program for optimal variable weighting for ultrametric and additive tree fitting , 1988 .

[13] Michael I. Jordan,et al. Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[14] Vladimir Makarenkov,et al. Optimal Variable Weighting for Ultrametric and Additive Trees and K-means Partitioning: Methods and Software , 2001, J. Classif..

[15] G. Soete. Optimal variable weighting for ultrametric and additive tree clustering , 1986 .

[16] Philip S. Yu,et al. Fast algorithms for projected clustering , 1999, SIGMOD '99.

[17] Hichem Frigui,et al. Unsupervised learning of prototypes and attribute weights , 2004, Pattern Recognit..

[18] Thomas L. Griffiths,et al. Probabilistic Topic Models , 2007 .

[19] Philip S. Yu,et al. Finding generalized projected clusters in high dimensional spaces , 2000, SIGMOD '00.

[20] Rich Caruana,et al. Consensus Clusterings , 2007, Seventh IEEE International Conference on Data Mining (ICDM 2007).

[21] Yunming Ye,et al. A feature group weighting method for subspace clustering of high-dimensional data , 2012, Pattern Recognit..

[22] Xiaoying Gao,et al. Exploiting User Queries for Search Result Clustering , 2013, WISE.

[23] George Karypis,et al. Comparison of Agglomerative and Partitional Document Clustering Algorithms , 2002 .

[24] Xiaoying Gao,et al. Multi-view clustering of web documents using multi-objective genetic algorithm , 2014, 2014 IEEE Congress on Evolutionary Computation (CEC).

[25] Hichem Frigui,et al. Simultaneous Clustering and Dynamic Keyword Weighting for Text Documents , 2004 .

[26] Dimitrios Gunopulos,et al. Subspace Clustering of High Dimensional Data , 2004, SDM.

[27] Philip S. Yu,et al. Finding generalized projected clusters in high dimensional spaces , 2000, SIGMOD 2000.

[28] Yunming Ye,et al. TW-k-means: Automated two-level variable weighting clustering algorithm for multiview data , 2013, IEEE Transactions on Knowledge and Data Engineering.

[29] Shi Zhong,et al. A Comparative Study of Generative Models for Document Clustering , 2003 .

[30] Dimitrios Gunopulos,et al. Automatic subspace clustering of high dimensional data for data mining applications , 1998, SIGMOD '98.

[31] Yi Zhang,et al. Entropy-based subspace clustering for mining numerical data , 1999, KDD '99.

[32] Michael K. Ng,et al. Subspace Clustering of Text Documents with Feature Weighting K-Means Algorithm , 2005, PAKDD.