New Trending Events Detection based on the Multi-Representation Index Tree Clustering

Traditional Clustering is a powerful technique for revealing the hot topics among Web information. However, it failed to discover the trending events coming out gradually. In this paper, we propose a novel method to address this problem which is modeled as detecting the new cluster from time-streaming documents. Our approach concludes three parts: the cluster definition based on Multi- Representation Index Tree (MI-Tree), the new cluster detecting process and the metrics for measuring a new cluster. Compared with the traditional method, we process the newly coming data first and merge the old clustering tree into the new one. Our algorithm can avoid that the documents owning high similarity were assigned to different clusters. We designed and implemented a system for practical application, the experimental results on a variety of domains demonstrate that our algorithm can recognize new valuable cluster during the iteration process, and produce quality clusters.

[1]  Rajeev Motwani,et al.  Incremental Clustering and Dynamic Information Retrieval , 2004, SIAM J. Comput..

[2]  Kuo Zhang,et al.  New event detection based on indexing-tree and named entity , 2007, SIGIR.

[3]  Weimao Ke,et al.  Dynamicity vs. effectiveness: studying online clustering for scatter/gather , 2009, SIGIR.

[4]  Oren Etzioni,et al.  Web document clustering: a feasibility demonstration , 1998, SIGIR '98.

[5]  Thorsten Brants,et al.  A System for new event detection , 2003, SIGIR.

[6]  Ramayya Krishnan,et al.  Incremental hierarchical clustering of text documents , 2006, CIKM '06.

[7]  Mohamed S. Kamel,et al.  Incremental document clustering using cluster similarity histograms , 2003, Proceedings IEEE/WIC International Conference on Web Intelligence (WI 2003).

[8]  Ada Wai-Chee Fu,et al.  Incremental Document Clustering for Web Page Classification , 2002 .

[9]  Yue Xu,et al.  Enhancing an Incremental Clustering Algorithm for Web Page Collections , 2009, 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology.

[10]  Chung-Chian Hsu,et al.  Incremental clustering of mixed data based on distance hierarchy , 2008, Expert Syst. Appl..

[11]  Philip S. Yu,et al.  Time-dependent event hierarchy construction , 2007, KDD '07.

[12]  Marie-Francine Moens,et al.  An Aspect Based Document Representation for Event Clustering : SA-OT accounts for pronoun resolution in child language , 2009 .

[13]  Maria Soledad Pera,et al.  Utilizing phrase-similarity measures for detecting and clustering informative RSS news articles , 2008, Integr. Comput. Aided Eng..

[14]  Min Zhang,et al.  Automatic online news issue construction in web environment , 2008, WWW.