Hierarchical Document Clustering Using Closed Itemsets with Comparision Using Weka Tools

In Today's world huge amount of knowledge is propagated and stored in text databases over the large networks. This leads to increment in the numbers of document files. So we need a vigorous and skillful way to group this large amount of data. Clustering is the finest tool of data mining for regulating and harmonizing information. Clustering outline the similar objects or data into one cluster and different objects into another one based on their measurement of diminishing inter similarity and overestimating intra dissimilarity. However most of the clustering techniques face many issues like high dimensionality, scalability, accuracy, etc. Document clustering is an unsupervised clustering method for organizing documents, and providing fast information retrieval or filtering. This paper will present a review on some document clustering methods and proposal of a new one approach for hierarchical document clustering using closed item-sets.

[1]  C. Krishna Mohan,et al.  Efficient clustering approach using incremental and hierarchical clustering methods , 2010, The 2010 International Joint Conference on Neural Networks (IJCNN).

[2]  John R. Kender,et al.  High Quality, Efficient Hierarchical Document Clustering Using Closed Interesting Itemsets , 2006, Sixth International Conference on Data Mining (ICDM'06).

[3]  Martin Ester,et al.  Frequent term-based text clustering , 2002, KDD.

[4]  Christian Bauckhage,et al.  A Fast, Feature-based Cluster Algorithm for Information Retrieval , 2007, 2007 IEEE International Conference on Information Reuse and Integration.

[5]  Ke Wang,et al.  Hierarchical Document Clustering , 2009, Encyclopedia of Data Warehousing and Mining.

[6]  Soon Myoung Chung,et al.  Parallel bisecting k-means with prediction clustering algorithm , 2006, The Journal of Supercomputing.

[7]  Renu Dhir,et al.  A Frequent Concepts Based Document Clustering Algorithm , 2010 .

[8]  Benjamin C. M. Fung,et al.  An Efficient Hybrid Hierarchical Document Clustering Method , 2008, 2008 Fifth International Conference on Fuzzy Systems and Knowledge Discovery.

[9]  Renxia Wan,et al.  A Fast Incremental Clustering Algorithm , 2009 .