Hierarchical Document Clustering: Review with Comparison

Hierarchical Document clustering is automatic organization of documents into clusters so that documents within a cluster have high similarity in comparison to documents in other clusters. It is based on the principle of maximizing intra-similarity and minimizing inter-similarity.It has been studied intensively because of its wide applicability in various areas such as web mining, search engines, and information retrieval. It provides efficient representation and visualization of the documents; thus helps in easy navigation also. In this paper, we have given overview of Hierarchical document clustering with its featureselection process, applications, challenges in document clustering, similarity measures and evaluation of document clustering algorithm.In this paper variousHierarchical document clustering techniques are discussed along with their pros and cons.

[1]  George Karypis,et al.  Document Clustering , 2010, Encyclopedia of Machine Learning.

[2]  Frank S. C. Tseng,et al.  An integration of WordNet and fuzzy association rule mining for multi-label document clustering , 2010, Data Knowl. Eng..

[3]  Martin Ester,et al.  Frequent term-based text clustering , 2002, KDD.

[4]  Xiaohui Cui,et al.  Document Clustering Analysis Based on Hybrid PSO+K-means Algorithm , 2005 .

[5]  Frank S. C. Tseng,et al.  An integration of fuzzy association rules and WordNet for document clustering , 2010, Knowledge and Information Systems.

[6]  Yong Wang,et al.  Document Clustering with Semantic Analysis , 2006, Proceedings of the 39th Annual Hawaii International Conference on System Sciences (HICSS'06).

[7]  George Karypis,et al.  A Comparison of Document Clustering Techniques , 2000 .

[8]  Chih-Ping Wei,et al.  Combining preference- and content-based approaches for improving document clustering effectiveness , 2006, Inf. Process. Manag..

[9]  Renu Dhir,et al.  A Frequent Concepts Based Document Clustering Algorithm , 2010 .

[10]  Benjamin C. M. Fung,et al.  Hierarchical Document Clustering using Frequent Itemsets , 2003, SDM.

[11]  Frank S. C. Tseng,et al.  Mining fuzzy frequent itemsets for hierarchical document clustering , 2010, Inf. Process. Manag..

[12]  Reynaldo Gil-García,et al.  Dynamic hierarchical algorithms for document clustering , 2010, Pattern Recognit. Lett..

[13]  Hui Xiong,et al.  Towards understanding hierarchical clustering: A data distribution perspective , 2009, Neurocomputing.

[14]  Thomas E. Potok,et al.  A flocking based algorithm for document clustering analysis , 2006, J. Syst. Archit..

[15]  M. Phil,et al.  Survey on Feature Selection in Document Clustering , 2011 .

[16]  M. Punithavalli,et al.  Survey on Feature Selection in Document Clustering , 2011 .