A Soft Subspace Clustering Method for Text Data Using a Probability Based Feature Weighting Scheme

Clustering methods aim to find clusters or groups of similar objects in a given set of data. Common soft subspace clustering methods for text data find different clusters in subspaces using a weighted distance measure. The weighting scheme heavily affects the clustering performance and requires special consideration. Since text data has semantic information along with syntactic information, a weighting scheme, which uses semantic information, is more likely to generate a better clustering solution.

[1]  Lise Getoor,et al.  A Latent Dirichlet Model for Unsupervised Entity Resolution , 2005, SDM.

[2]  F. Wilcoxon Individual Comparisons by Ranking Methods , 1945 .

[3]  Xiaoying Gao,et al.  Multi-objective multi-view clustering ensemble based on evolutionary approach , 2015, 2015 IEEE Congress on Evolutionary Computation (CEC).

[4]  Myoung-Ho Kim,et al.  FINDIT: a fast and intelligent subspace clustering algorithm using dimension voting , 2004, Inf. Softw. Technol..

[5]  Philip S. Yu,et al.  /spl delta/-clusters: capturing subspace correlation in a large data set , 2002, Proceedings 18th International Conference on Data Engineering.

[6]  J. Friedman,et al.  Clustering objects on subsets of attributes (with discussion) , 2004 .

[7]  J. Carroll,et al.  Synthesized clustering: A method for amalgamating alternative clustering bases with differential weighting of variables , 1984 .

[8]  Michael K. Ng,et al.  An optimization algorithm for clustering using weighted dissimilarity measures , 2004, Pattern Recognit..

[9]  W. Scott Spangler,et al.  Feature Weighting in k-Means Clustering , 2003, Machine Learning.

[10]  C. Elkan,et al.  Topic Models , 2008 .

[11]  Michael K. Ng,et al.  An Entropy Weighting k-Means Algorithm for Subspace Clustering of High-Dimensional Sparse Data , 2007, IEEE Transactions on Knowledge and Data Engineering.

[12]  G. Soete OVWTRE: A program for optimal variable weighting for ultrametric and additive tree fitting , 1988 .

[13]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[14]  Vladimir Makarenkov,et al.  Optimal Variable Weighting for Ultrametric and Additive Trees and K-means Partitioning: Methods and Software , 2001, J. Classif..

[15]  G. Soete Optimal variable weighting for ultrametric and additive tree clustering , 1986 .

[16]  Philip S. Yu,et al.  Fast algorithms for projected clustering , 1999, SIGMOD '99.

[17]  Hichem Frigui,et al.  Unsupervised learning of prototypes and attribute weights , 2004, Pattern Recognit..

[18]  Thomas L. Griffiths,et al.  Probabilistic Topic Models , 2007 .

[19]  Philip S. Yu,et al.  Finding generalized projected clusters in high dimensional spaces , 2000, SIGMOD '00.

[20]  Rich Caruana,et al.  Consensus Clusterings , 2007, Seventh IEEE International Conference on Data Mining (ICDM 2007).

[21]  Yunming Ye,et al.  A feature group weighting method for subspace clustering of high-dimensional data , 2012, Pattern Recognit..

[22]  Xiaoying Gao,et al.  Exploiting User Queries for Search Result Clustering , 2013, WISE.

[23]  George Karypis,et al.  Comparison of Agglomerative and Partitional Document Clustering Algorithms , 2002 .

[24]  Xiaoying Gao,et al.  Multi-view clustering of web documents using multi-objective genetic algorithm , 2014, 2014 IEEE Congress on Evolutionary Computation (CEC).

[25]  Hichem Frigui,et al.  Simultaneous Clustering and Dynamic Keyword Weighting for Text Documents , 2004 .

[26]  Dimitrios Gunopulos,et al.  Subspace Clustering of High Dimensional Data , 2004, SDM.

[27]  Philip S. Yu,et al.  Finding generalized projected clusters in high dimensional spaces , 2000, SIGMOD 2000.

[28]  Yunming Ye,et al.  TW-k-means: Automated two-level variable weighting clustering algorithm for multiview data , 2013, IEEE Transactions on Knowledge and Data Engineering.

[29]  Shi Zhong,et al.  A Comparative Study of Generative Models for Document Clustering , 2003 .

[30]  Dimitrios Gunopulos,et al.  Automatic subspace clustering of high dimensional data for data mining applications , 1998, SIGMOD '98.

[31]  Yi Zhang,et al.  Entropy-based subspace clustering for mining numerical data , 1999, KDD '99.

[32]  Michael K. Ng,et al.  Subspace Clustering of Text Documents with Feature Weighting K-Means Algorithm , 2005, PAKDD.