论文信息 - Incremental document clustering using Multi-representation Indexing Tree

Incremental document clustering using Multi-representation Indexing Tree

Incremental Document Clustering is a powerful technique for large-scale topic discovery from incremental documentation set. Indexing tree algorithm is advanced in efficiency. However, it tended to process spherical data. To address this problem, we present a novel Multi-Representation Indexing Tree (MRIT) algorithm for constructing a hierarchy that satisfies arbitrary shape clusters with a good performance. Compared with the Indexing tree algorithm, a cluster is decomposed into several sub clusters and is represented as a union of the sub clusters rather than the center of the cluster. Similarity of a document to one cluster is the distance to the nearest neighbor among the cluster's representative points. The experimental results on a variety of domains demonstrate that our algorithm can produce a quality cluster. It's insensitive to document input order, and efficient in terms of computational time.

[1] Ramayya Krishnan,et al. Incremental hierarchical clustering of text documents , 2006, CIKM '06.

[2] Kuo Zhang,et al. New event detection based on indexing-tree and named entity , 2007, SIGIR.

[3] Chung-Chian Hsu,et al. Incremental clustering of mixed data based on distance hierarchy , 2008, Expert Syst. Appl..

[4] Marie-Francine Moens,et al. An Aspect Based Document Representation for Event Clustering : SA-OT accounts for pronoun resolution in child language , 2009 .

[5] P. Sopp. Cluster analysis. , 1996, Veterinary immunology and immunopathology.

[6] Mohamed S. Kamel,et al. Incremental document clustering using cluster similarity histograms , 2003, Proceedings IEEE/WIC International Conference on Web Intelligence (WI 2003).

[7] Min Zhang,et al. Automatic online news issue construction in web environment , 2008, WWW.

[8] Ada Wai-Chee Fu,et al. Incremental Document Clustering for Web Page Classification , 2002 .

[9] Rajeev Motwani,et al. Incremental Clustering and Dynamic Information Retrieval , 2004, SIAM J. Comput..

[10] John Yen,et al. An incremental approach to building a cluster hierarchy , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..