Emergent topic detecting method and system facing text streams of micro-blog platform
暂无分享,去创建一个
The invention provides an emergent topic detecting method and system facing text streams of a micro-blog platform. The method comprises the following steps that (1) user data and user generation information data of the micro-blog platform are collected in real time, and information text and images are extracted; (2) a time window is set, the information text is divided, and real-time data streams and historical data are obtained; (3) characteristics are selected, and training of a popularity evaluation model and a long micro-blog extraction model is carried out; (4) popularity evaluation is carried out on the real-time data streams by means of the popularity evaluation model, long micro-blog extraction is carried out on the real-time data streams by means of the long micro-blog extraction model, the information which is evaluated to be popular is put into popular information sets, and extracted long micro-blog contents are put into long micro-blog sets; (5) whether the number of the popular information sets and the number of the long micro-blog sets achieve preset threshold values is judged, if yes, topic extraction is carried out through an LDA model or in a weighting summation mode, emergent topics are extracted from data of the popular information sets and the long micro-blog sets, if no, the method goes back to the step (1).
[1] Kuan-Yu Chen,et al. Hot Topic Extraction Based on Timeline Analysis and Multidimensional Sentence Modeling , 2007, IEEE Transactions on Knowledge and Data Engineering.
[2] Xiaolong Zheng,et al. Detecting popular topics in micro-blogging based on a user interest-based model , 2012, The 2012 International Joint Conference on Neural Networks (IJCNN).