Emergent topic detecting method and system facing text streams of micro-blog platform

The invention provides an emergent topic detecting method and system facing text streams of a micro-blog platform. The method comprises the following steps that (1) user data and user generation information data of the micro-blog platform are collected in real time, and information text and images are extracted; (2) a time window is set, the information text is divided, and real-time data streams and historical data are obtained; (3) characteristics are selected, and training of a popularity evaluation model and a long micro-blog extraction model is carried out; (4) popularity evaluation is carried out on the real-time data streams by means of the popularity evaluation model, long micro-blog extraction is carried out on the real-time data streams by means of the long micro-blog extraction model, the information which is evaluated to be popular is put into popular information sets, and extracted long micro-blog contents are put into long micro-blog sets; (5) whether the number of the popular information sets and the number of the long micro-blog sets achieve preset threshold values is judged, if yes, topic extraction is carried out through an LDA model or in a weighting summation mode, emergent topics are extracted from data of the popular information sets and the long micro-blog sets, if no, the method goes back to the step (1).

[1]  Kuan-Yu Chen,et al.  Hot Topic Extraction Based on Timeline Analysis and Multidimensional Sentence Modeling , 2007, IEEE Transactions on Knowledge and Data Engineering.

[2]  Xiaolong Zheng,et al.  Detecting popular topics in micro-blogging based on a user interest-based model , 2012, The 2012 International Joint Conference on Neural Networks (IJCNN).