An Online Hot Topics Detection Approach Using the Improved Ant Colony Text Clustering Algorithm

Recently, with an increasing number of major events spreading all over the Internet, the research for online hot topics detection system has been paid more and more attention. In this paper, we proposed an unsupervised and efficient hot topics detection approach, which is based on an improved ant colony text clustering (IACTC) algorithm. In view of the deficiencies of the basic ant colony text clustering (BACTC) method, the inverse tangent function is introduced into the proposed algorithm as a novel probability function, which could be flexibly adjusted. Meanwhile, in order to reduce ants’ blinding movement, we add the memory organ to each ant, and we develop the concept of adaptive moving range. Additionally, we design a practical system framework with technical details to solve the problems involving in the real-time hot topics detection system (e.g., information collecting, web preprocessing and text processing). This system is also used to evaluate our proposed method .The experimental results show that our approach has numerous advantages and achieves satisfying results.

[1]  Qingsheng Zhu,et al.  Hierarchical Model Exploiting Context and Semantic Relationship for Document Classification , 2011 .

[2]  Berthier A. Ribeiro-Neto,et al.  A brief survey of web data extraction tools , 2002, SGMD.

[3]  Alberto H. F. Laender,et al.  Automatic generation of agents for collecting hidden Web pages for data extraction , 2004, Data Knowl. Eng..

[4]  Mitsuru Ishizuka,et al.  Topic extraction from news archive using TF*PDF algorithm , 2002, Proceedings of the Third International Conference on Web Information Systems Engineering, 2002. WISE 2002..

[5]  Jean-Louis Deneubourg,et al.  The dynamics of collective sorting robot-like ants and ant-like robots , 1991 .

[6]  Helen M. Meng,et al.  Using contextual analysis for news event detection , 2001, Int. J. Intell. Syst..

[7]  Marco Dorigo,et al.  On the Performance of Ant-based Clustering , 2003, HIS.

[8]  Yiming Yang,et al.  A study of retrospective and on-line event detection , 1998, SIGIR '98.

[9]  Baldo Faieta,et al.  Diversity and adaptation in populations of clustering ants , 1994 .

[10]  Chen Ling,et al.  An Adaptive Ant Clustering Algorithm , 2006 .

[11]  Bin Tang,et al.  The Application and Research of Classification of POCP SMS based on Text Mining , 2011 .

[12]  Wu Bin,et al.  CSIM: a document clustering algorithm based on swarm intelligence , 2002, Proceedings of the 2002 Congress on Evolutionary Computation. CEC'02 (Cat. No.02TH8600).

[13]  Julia Handl,et al.  Improved Ant-Based Clustering and Sorting , 2002, PPSN.