Real-time identification of urban rainstorm waterlogging disasters based on Weibo big data

With the acceleration of urbanisation in China, preventing and reducing the economic losses and casualties caused by urban rainstorm waterlogging disasters have become a critical and difficult issue that the government is concerned about. As urban storms are sudden, clustered, continuous, and cause huge economic losses, it is difficult to conduct emergency management. Developing a more scientific method for real-time disaster identification will help prevent losses over time. Examining social media big data is a feasible method for obtaining on-site disaster data and carrying out disaster risk assessments. This paper presents a real-time identification method for urban-storm disasters using Weibo data. Taking the June 2016 heavy rainstorm in Nanjing as an example, the obtained Weibo data are divided into eight parts for the training data set and two parts for the testing data set. It then performs text pre-processing using the Jieba segmentation module for word segmentation. Then, the term frequency–inverse document frequency method is used to calculate the feature items weights and extract the features. Hashing algorithms are introduced for processing high-dimensional sparse vector matrices. Finally, the naive Bayes, support vector machine, and random forest text classification algorithms are used to train the model, and a test set sample is introduced for testing the model to select the optimal classification algorithm. The experiments showed that the naive Bayes algorithm had the highest macro-average accuracy.

[1]  Geoffrey E. Hinton,et al.  Semantic hashing , 2009, Int. J. Approx. Reason..

[2]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[3]  Yutaka Matsuo,et al.  Earthquake shakes Twitter users: real-time event detection by social sensors , 2010, WWW '10.

[4]  Guo Yixi Bursty topics detection approach on Chinese microblog based on burst words clustering , 2014 .

[5]  Seonhwa Choi,et al.  The Real-Time Monitoring System of Social Big Data for Disaster Management , 2015 .

[6]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[7]  Ximing Cai,et al.  Predict seasonal low flows in the upper Yangtze River using random forests model , 2012 .

[8]  Bernard J. Jansen,et al.  Twitter power: Tweets as electronic word of mouth , 2009, J. Assoc. Inf. Sci. Technol..

[9]  Chen Huang,et al.  Microblogging after a major disaster in China: a case study of the 2010 Yushu earthquake , 2011, CSCW.

[10]  S. Rigatti Random Forest. , 2017, Journal of insurance medicine.

[11]  Zhou Ming,et al.  Hierarchical Structure Based Hybrid Approach to Sentiment Analysis of Chinese Micro Blog and Its Feature Extraction , 2012 .

[12]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[13]  Bo Pang,et al.  Thumbs up? Sentiment Classification using Machine Learning Techniques , 2002, EMNLP.

[14]  Ioan Sporea,et al.  Computer aided diagnosis method for steatosis rating in ultrasound images using random forests. , 2013, Medical ultrasonography.

[15]  X. Chen,et al.  Random forests for genomic data analysis. , 2012, Genomics.

[16]  P. Bagavathi Sivakumar,et al.  Usage and analysis of Twitter during 2015 Chennai flood towards disaster management , 2017 .

[17]  Yi Pan,et al.  Computer Science and its Applications: CSA 2012 , 2012 .

[18]  Yan Cao,et al.  Classification of Micro-blog Sentiment Based on Naive Bayesian Classifier , 2015 .

[19]  Alan F. Smeaton,et al.  Classifying sentiment in microblogs: is brevity an advantage? , 2010, CIKM.

[20]  Lv Xueqiang,et al.  Research on Chinese Micro-blog Bursty Topics Detection , 2013 .

[21]  Tin Kam Ho,et al.  The Random Subspace Method for Constructing Decision Forests , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[22]  Solomon Tesfamariam,et al.  Earthquake induced damage classification for reinforced concrete buildings , 2010 .

[23]  Li Xibing,et al.  Prediction of rockburst classification using Random Forest , 2013 .

[24]  Youngjoong Ko,et al.  How to use negative class information for Naive Bayes classification , 2017, Inf. Process. Manag..

[25]  Trs Information Research on Chinese Micro-blog Bursty Topics Detection , 2013 .