An Efficient Document Clustering by Optimization Technique for Cluster Optimality

clustering grows to be a very famous technique with the popularity of the web which also indicates that quick and best clustering technique acts as an important issue. Document clustering is about identifying semantically interconnected groups from formless collection of text documents. Feature Selection is significant for clustering process because number of the isolated or redundant feature should misguide the clustering results. Existing work presented improved Niching memetic algorithm and improved Genetic algorithm (GA) for feature selection. To attain more perfect document clustering, more instructive features including optimal conceptual weight are essential. In this paper, the proposed work presents the optimization technique to evaluate the cluster optimality for efficient document clustering based on the optimized conceptual feature words. The conceptual words (similarity words) are extracted from the featured words by using feature selection process. The important of cluster words are identified by the optimal conceptual word weight values. Experiments are carried out to evaluate the proposed optimization technique for efficient document clustering in terms of Conceptual word weight, Number of conceptual words and optimal conceptual word weight. Keywordsclustering, Conceptual words, cluster optimality

[1]  Xiang Zhang,et al.  CRD: fast co-clustering on large datasets utilizing sampling-based matrix decomposition , 2008, SIGMOD Conference.

[2]  Ann Q. Gates,et al.  TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING , 2005 .

[3]  Chih-Ping Wei,et al.  Combining preference- and content-based approaches for improving document clustering effectiveness , 2006, Inf. Process. Manag..

[4]  Michael K. Ng,et al.  An Entropy Weighting k-Means Algorithm for Subspace Clustering of High-Dimensional Sparse Data , 2007, IEEE Transactions on Knowledge and Data Engineering.

[5]  A. K. Santra,et al.  Genetic Algorithm and Confusion Matrix for Document Clustering , 2012 .

[6]  A. Santra,et al.  Cluster Based Hybrid Niche Mimetic and Genetic Algorithm for Text Document Categorization , 2011 .

[7]  Huan Liu,et al.  Toward integrating feature selection algorithms for classification and clustering , 2005, IEEE Transactions on Knowledge and Data Engineering.

[8]  Shie-Jue Lee,et al.  A Fuzzy Self-Constructing Feature Clustering Algorithm for Text Classification , 2011, IEEE Transactions on Knowledge and Data Engineering.

[9]  Zhi-Hua Zhou,et al.  Distributional features for text categorization , 2006 .

[10]  Brendan J. Frey,et al.  Non-metric affinity propagation for unsupervised image categorization , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[11]  Sun Park,et al.  Document Clustering Method Using Weighted Semantic Features and Cluster Similarity , 2010, 2010 Third IEEE International Conference on Digital Game and Intelligent Toy Enhanced Learning.

[12]  Kusum Deep,et al.  Quadratic approximation based hybrid genetic algorithm for function optimization , 2008, Appl. Math. Comput..

[13]  Maurizio Marchese,et al.  Text Clustering with Seeds Affinity Propagation , 2011, IEEE Transactions on Knowledge and Data Engineering.

[14]  Soon Myoung Chung,et al.  Text Clustering with Feature Selection by Using Statistical Data , 2008, IEEE Transactions on Knowledge and Data Engineering.