Natural disaster topic extraction in Sina microblogging based on graph analysis

Abstract In this paper, we will propose a novel approach based on graph analysis which will use community structure detection algorithm to detect topics in the keywords graph of micro-blogging data. Furthermore, considering the specificity of the Sina microblogging, we propose novel keywords filtering model and graph generation algorithm to meet the dual requirements of topic detection and community detection. We validate our approach on a big natural disaster dataset from Sina micro-blog, in which about 103 micro-blogging posts with about 104 distinct feature tags. The experimental results definitely revealed the relationship between the keywords and the natural disaster topics. Our methodology is a scalable method which can adapt to the changes in the amount of data. Especially, we can get abundant information about natural disasters in the topic detection and help the government guide the rescue of disasters.

[1]  Peng Chang,et al.  Online hot topic detection from web news archive in short terms , 2014, 2014 11th International Conference on Fuzzy Systems and Knowledge Discovery (FSKD).

[2]  Hendri Murfi,et al.  The K-means with mini batch algorithm for topics detection on online news , 2016, 2016 4th International Conference on Information and Communication Technology (ICoICT).

[3]  Hongfei Lin,et al.  Detection and Extraction of Hot Topics on Chinese Microblogs , 2015, Cognitive Computation.

[4]  Raymond Chiong,et al.  An unsupervised multilingual approach for online social media topic identification , 2017, Expert Syst. Appl..

[5]  Tinghuai Ma,et al.  Detect structural‐connected communities based on BSCHEF in C‐DBLP , 2016, Concurr. Comput. Pract. Exp..

[6]  Evangelos Kanoulas,et al.  Dynamic Clustering of Streaming Short Documents , 2016, KDD.

[7]  Louiqa Raschid,et al.  A Graph Analytical Approach for Topic Detection , 2013, TOIT.

[8]  Ronghua Shang,et al.  Community detection based on modularity and an improved genetic algorithm , 2013 .

[9]  Tinghuai Ma,et al.  A novel subgraph K+\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$K^{+}$$\end{document}-isomorphism method in social , 2017, Soft Computing.

[10]  Tinghuai Ma,et al.  An efficient and scalable density-based clustering algorithm for datasets with complex structures , 2016, Neurocomputing.

[11]  Qingming Huang,et al.  Unsupervised Web Topic Detection Using A Ranked Clustering-Like Pattern Across Similarity Cascades , 2015, IEEE Transactions on Multimedia.

[12]  Jui-Feng Yeh,et al.  Topic detection and tracking for conversational content by using conceptual dynamic latent Dirichlet allocation , 2016, Neurocomputing.

[13]  Bo Huang,et al.  Microblog Topic Detection Based on LDA Model and Single-Pass Clustering , 2012, RSCTC.

[14]  B. Annappa,et al.  Community detection using meta-heuristic approach: Bat algorithm variants , 2016, 2016 Ninth International Conference on Contemporary Computing (IC3).

[15]  Bo Hu,et al.  An Improved Single-Pass Algorithm for Chinese Microblog Topic Detection and Tracking , 2016, 2016 IEEE International Congress on Big Data (BigData Congress).

[16]  Chen-Kun Tsung,et al.  A Spectral Clustering Approach Based on Modularity Maximization for Community Detection Problem , 2016, 2016 International Computer Symposium (ICS).

[17]  Yajun Du,et al.  Hot topic extraction based on Chinese Microblog's Features topic model , 2016, 2016 IEEE International Conference on Cloud Computing and Big Data Analysis (ICCCBDA).

[18]  Feng Gao,et al.  Early detection method for emerging topics based on dynamic bayesian networks in micro-blogging networks , 2016, Expert Syst. Appl..

[19]  Mintu Philip,et al.  Keyword Based Tweet Extraction and Detection of Related Topics , 2015 .

[20]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[21]  Yung-Chun Chang,et al.  A semantic frame-based intelligent agent for topic detection , 2017, Soft Comput..

[22]  Yuefeng Li,et al.  Extracting news blog hot topics based on the W2T Methodology , 2013, World Wide Web.

[23]  Marek R. Ogiela,et al.  Clustering of trending topics in microblogging posts: A graph-based approach , 2017, Future Gener. Comput. Syst..

[24]  Yuan Tian,et al.  Protection of location privacy for moving kNN queries in social networks , 2017, Appl. Soft Comput..

[25]  Jean-Loup Guillaume,et al.  Fast unfolding of communities in large networks , 2008, 0803.0476.

[26]  Tinghuai Ma,et al.  K-anonymisation of social network by vertex and edge modification , 2016, Int. J. Embed. Syst..

[27]  Kun Guo,et al.  Popular Topic Detection in Chinese Micro-Blog Based on the Modified LDA Model , 2015, 2015 12th Web Information System and Application Conference (WISA).

[28]  Chen Zhang,et al.  A hybrid term-term relations analysis approach for topic detection , 2016, Knowl. Based Syst..

[29]  Tinghuai Ma,et al.  Graph classification based on graph set reconstruction and graph kernel feature reduction , 2018, Neurocomputing.

[30]  Ana M. García-Serrano,et al.  A step forward for Topic Detection in Twitter: An FCA-based approach , 2016, Expert Syst. Appl..

[31]  Peng Jin,et al.  The Construction of a Kind of Chat Corpus in Chinese Word Segmentation , 2015, 2015 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology (WI-IAT).

[32]  Massih-Reza Amini,et al.  Streaming-LDA: A Copula-based Approach to Modeling Topic Dependencies in Document Streams , 2016, KDD.

[33]  Gang Liu,et al.  A genetic algorithm for community detection in complex networks , 2013, Journal of Central South University.

[34]  Yao Wang,et al.  LED: A fast overlapping communities detection algorithm based on structural clustering , 2016, Neurocomputing.