Hierarchical Star Clustering Algorithm for Dynamic Document Collections

In this paper, a new clustering algorithm called DynamicHierarchical Staris introduced. Our approach aims to construct a hierarchy of overlapped clusters, dealing with dynamic data sets. The experimental results on several benchmark text collections show that this method obtains smaller hierarchies than traditional algorithms while achieving a similar clustering quality. Therefore, we advocate its use for tasks that require dynamic overlapped clustering, such as information organization, creation of document taxonomies and hierarchical topic detection.

[1]  Oren Etzioni,et al.  Web document clustering: a feasibility demonstration , 1998, SIGIR '98.

[2]  Reynaldo Gil-García,et al.  Extended Star Clustering Algorithm , 2003, CIARP.

[3]  Anil K. Jain,et al.  Algorithms for Clustering Data , 1988 .

[4]  Martin Ester,et al.  Frequent term-based text clustering , 2002, KDD.

[5]  Soon Myoung Chung,et al.  Text document clustering based on frequent word meaning sequences , 2008, Data Knowl. Eng..

[6]  Irmina Masłowska Phrase-based hierarchical clustering of web search results , 2003 .

[7]  Ada Wai-Chee Fu,et al.  Incremental Document Clustering for Web Page Classification , 2002 .

[8]  Daniela Rus,et al.  Static and dynamic information organization with star clusters , 1998, CIKM '98.

[9]  Chinatsu Aone,et al.  Fast and effective text mining using linear-time document clustering , 1999, KDD '99.

[10]  José Francisco Martínez-Trinidad,et al.  Progress in Pattern Recognition, Image Analysis and Applications, 12th Iberoamericann Congress on Pattern Recognition, CIARP 2007, Valparaiso, Chile, November 13-16, 2007, Proceedings , 2008, CIARP.

[11]  Reynaldo Gil-García,et al.  Dynamic Hierarchical Compact Clustering Algorithm , 2005, CIARP.

[12]  Alberto Sanfeliu,et al.  Progress in Pattern Recognition, Speech and Image Analysis , 2003, Lecture Notes in Computer Science.

[13]  George Karypis,et al.  A Comparison of Document Clustering Techniques , 2000 .

[14]  Peter Ingwersen,et al.  Developing a Test Collection for the Evaluation of Integrated Search , 2010, ECIR.

[15]  Joydeep Ghosh,et al.  Model-based overlapping clustering , 2005, KDD '05.

[16]  John Yen,et al.  An incremental approach to building a cluster hierarchy , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[17]  José Eladio Medina-Pagola,et al.  A Clustering Algorithm Based on Generalized Stars , 2007, MLDM.