Hierarchical Document Clustering: A Review

As text documents are largely increasing in the internet, the process of grouping similar documents for versatile applications have put the eye of researchers in this area. However most clustering methods suffer from challenges in dealing with problems of high dimensionality, scalability, accuracy and meaningful cluster labels. This paper presents a review on all these well known methods of document clustering. Hierarchical document clustering method is explained in detail. Study shows that hierarchical document clustering performs well but still there is a scope to improve above mentioned problems.

[1]  George Karypis,et al.  A Comparison of Document Clustering Techniques , 2000 .

[2]  Martin Ester,et al.  Frequent term-based text clustering , 2002, KDD.

[3]  C. Krishna Mohan,et al.  Efficient clustering approach using incremental and hierarchical clustering methods , 2010, The 2010 International Joint Conference on Neural Networks (IJCNN).

[4]  Philip Tavel,et al.  Modeling and Simulation Design , 2011 .

[5]  John R. Kender,et al.  High Quality, Efficient Hierarchical Document Clustering Using Closed Interesting Itemsets , 2006, Sixth International Conference on Data Mining (ICDM'06).

[6]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[7]  Chun-Ling Chen,et al.  Hierarchical Document Clustering Using Fuzzy Association Rule Mining , 2008, 2008 3rd International Conference on Innovative Computing Information and Control.

[8]  Renu Dhir,et al.  A Frequent Concepts Based Document Clustering Algorithm , 2010 .

[9]  Anuj Sharma,et al.  A Wordsets based document clustering algorithm for large datasets , 2009, 2009 Proceeding of International Conference on Methods and Models in Computer Science (ICM2CS).

[10]  Benjamin C. M. Fung,et al.  Hierarchical Document Clustering using Frequent Itemsets , 2003, SDM.

[11]  Ke Wang,et al.  Hierarchical Document Clustering , 2009, Encyclopedia of Data Warehousing and Mining.

[12]  Frank S. C. Tseng,et al.  Mining fuzzy frequent itemsets for hierarchical document clustering , 2010, Inf. Process. Manag..

[13]  Renxia Wan,et al.  A Fast Incremental Clustering Algorithm , 2009 .

[14]  Rui Xu,et al.  Survey of clustering algorithms , 2005, IEEE Transactions on Neural Networks.