A NOTE ON WEIGHTED FUZZY K-MEANS CLUSTERING FOR CONCEPT DECOMPOSITION

While reducing the dimensionality of a corpus, concept decomposition (CD) based on fuzzy K-means (FKM) clustering provides better approximation than CD based on spherical k-means clustering. However, performance of the FKM algorithm is limited by its distance metric and it is proved that assignment of feature weights can improve the performance of FKM. Our work builds upon this analysis and proposes two approaches to feature weight selection. Using four testing document collections, we demonstrate that the CD based on the proposed feature-weighted FKM provides better approximation than the CD based on FKM while maintaining the quality of retrieval.

[1]  Frank Klawonn,et al.  A contribution to convergence theory of fuzzy c-means and derivatives , 2003, IEEE Trans. Fuzzy Syst..

[2]  Elizabeth R. Jessup,et al.  Matrices, Vector Spaces, and Information Retrieval , 1999, SIAM Rev..

[3]  Ch. Aswanikumar,et al.  Concept lattice reduction using fuzzy K-Means clustering , 2010, Expert Syst. Appl..

[4]  Cherukuri Aswani Kumar,et al.  On the Performance of Latent Semantic Indexing based Information Retrieval , 2009, J. Comput. Inf. Technol..

[5]  Ch. Aswani Kumar,et al.  A Note on the Effect of Term Weighting on Selecting Intrinsic Dimensionality of Data , 2009 .

[6]  Miin-Shen Yang,et al.  Bootstrapping approach to feature-weight selection in fuzzy c-means algorithms with an application in color image segmentation , 2008, Pattern Recognit. Lett..

[7]  C. Kumar,et al.  Latent Semantic Indexing using eigenvalue analysis for efficient information retrieval , 2006 .

[8]  Jonathan L. Elsas An Evaluation of Projection Techniques for Document Clustering: Latent Semantic Analysis and Independent Component Analysis , 2005 .

[9]  Jasminka Dobša,et al.  Comparison of Information Retrieval Techniques: Latent Semantic Indexing and Concept Indexing , 2004 .

[10]  George Karypis,et al.  Concept Indexing: A Fast Dimensionality Reduction Algorithm With Applications to Document Retrieval and Categorization , 2000 .

[11]  Sushant Sachdeva,et al.  Dimension Reduction , 2008, Encyclopedia of GIS.

[12]  David Hinkley,et al.  Bootstrap Methods: Another Look at the Jackknife , 2008 .

[13]  W. Scott Spangler,et al.  Feature Weighting in k-Means Clustering , 2003, Machine Learning.

[14]  George Karypis,et al.  A Comparison of Document Clustering Techniques , 2000 .

[15]  Richard A. Harshman,et al.  Indexing by Latent Semantic Analysis , 1990, J. Am. Soc. Inf. Sci..

[16]  Yi-Chung Hu,et al.  A NEW FUZZY-DATA MINING METHOD FOR PATTERN CLASSIFICATION BY PRINCIPAL COMPONENT ANALYSIS , 2005, Cybern. Syst..

[17]  Tormod Næs,et al.  New modifications and applications of fuzzy C-means methodology , 2008, Comput. Stat. Data Anal..

[18]  Bin Tang,et al.  Document Representation and Dimension Reduction for Text Clustering , 2007, 2007 IEEE 23rd International Conference on Data Engineering Workshop.

[19]  Cherukuri Aswani Kumar,et al.  Analysis of unsupervised dimensionality reduction techniques , 2009, Comput. Sci. Inf. Syst..

[20]  Bojana Dalbelo Basic,et al.  Concept decomposition by fuzzy k-means algorithm , 2003, Proceedings IEEE/WIC International Conference on Web Intelligence (WI 2003).

[21]  Sam T. Roweis,et al.  EM Algorithms for PCA and SPCA , 1997, NIPS.

[22]  S. Srinivas,et al.  Optimising the Heuristics in Latent Semantic Indexing for Effective Information Retrieval , 2006, J. Inf. Knowl. Manag..

[23]  Michael K. Ng,et al.  Automated variable weighting in k-means type clustering , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[24]  Inderjit S. Dhillon,et al.  Concept Decompositions for Large Sparse Text Data Using Clustering , 2004, Machine Learning.

[25]  Yadong Wang,et al.  Improving fuzzy c-means clustering based on feature-weight learning , 2004, Pattern Recognit. Lett..

[26]  Christian Döring,et al.  Data analysis with fuzzy clustering methods , 2006, Comput. Stat. Data Anal..

[27]  Jing Gao,et al.  Text Retrieval Using Sparsified Concept Decomposition Matrix , 2004, CIS.

[28]  Michael W. Berry,et al.  Document clustering using nonnegative matrix factorization , 2006, Inf. Process. Manag..

[29]  Erkki Oja,et al.  Independent component analysis: algorithms and applications , 2000, Neural Networks.