Comparison of different machine learning techniques on location extraction by utilizing geo-tagged tweets: A case study

Abstract In emergencies, Twitter is an important platform to get situational awareness simultaneously. Therefore, information about Twitter users’ location is a fundamental aspect to understand the disaster effects. But location extraction is a challenging task. Most of the Twitter users do not share their locations in their tweets. In that respect, there are different methods proposed for location extraction which cover different fields such as statistics, machine learning, etc. This study is a sample study that utilizes geo-tagged tweets to demonstrate the importance of the location in disaster management by taking three cases into consideration. In our study, tweets are obtained by utilizing the “earthquake” keyword to determine the location of Twitter users. Tweets are evaluated by utilizing the Latent Dirichlet Allocation (LDA) topic model and sentiment analysis through machine learning classification algorithms including the Multinomial and Gaussian Naive Bayes, Support Vector Machine (SVM), Decision Tree, Random Forest, Extra Trees, Neural Network, k Nearest Neighbor (kNN), Stochastic Gradient Descent (SGD), and Adaptive Boosting (AdaBoost) classifications. Therefore, 10 different machine learning algorithms are applied in our study by utilizing sentiment analysis based on location-specific disaster-related tweets by aiming fast and correct response in a disaster situation. In addition, the effectiveness of each algorithm is evaluated in order to gather the right machine learning algorithm. Moreover, topic extraction via LDA is provided to comprehend the situation after a disaster. The gathered results from the application of three cases indicate that Multinomial Naive Bayes and Extra Trees machine learning algorithms give the best results with an F-measure value over 80%. The study aims to provide a quick response to earthquakes by applying the aforementioned techniques.

[1]  Jiue-An Yang,et al.  Building a Real-Time Geo-Targeted Event Observation (Geo) Viewer for Disaster Management and Situation Awareness , 2017 .

[2]  Yacine Rezgui,et al.  Predictive modelling for solar thermal energy systems: A comparison of support vector regression, random forest, extra trees and regression trees , 2018, Journal of Cleaner Production.

[3]  L. Javier García-Villalba,et al.  Using Twitter Data to Monitor Natural Disaster Social Dynamics: A Recurrent Neural Network Approach with Word Embeddings and Kernel Density Estimation , 2019, Sensors.

[4]  Francesco Archetti,et al.  Smart cities management by integrating sensors, models and user generated contents , 2013 .

[5]  Durga Toshniwal,et al.  Face off: Travel Habits, Road Conditions and Traffic City Characteristics Bared Using Twitter , 2019, IEEE Access.

[6]  Shai Ben-David,et al.  Understanding Machine Learning: From Theory to Algorithms , 2014 .

[7]  Jack C. P. Cheng,et al.  Selection of target LEED credits based on project information and climatic factors using data mining techniques , 2017, Adv. Eng. Informatics.

[8]  Taghi M. Khoshgoftaar,et al.  Survey on categorical data for neural networks , 2020, Journal of Big Data.

[9]  Judith Gelernter,et al.  An algorithm for local geoparsing of microtext , 2013, GeoInformatica.

[10]  Anne Ruas Advances in Cartography and GIScience. Volume 2 , 2011 .

[11]  Abbas Rajabifard,et al.  A Multi-Element Approach to Location Inference of Twitter: A Case for Emergency Response , 2016, ISPRS Int. J. Geo Inf..

[12]  James A. Thom,et al.  Geotagging Twitter Messages in Crisis Management , 2015, Comput. J..

[13]  Elisabetta Fersini,et al.  Earthquake management: a decision support system based on natural language processing , 2016, Journal of Ambient Intelligence and Humanized Computing.

[14]  Andrea Castelletti,et al.  Assessing the predictive capability of randomized tree-based ensembles in streamflow modelling , 2013 .

[15]  L. Comfort,et al.  A dynamic decision support system based on geographical information and mobile social networks: A model for tsunami risk mitigation in Padang, Indonesia , 2016 .

[16]  Cornelia Caragea,et al.  Disaster Response Aided by Tweet Classification with a Domain Adaptation Approach , 2018 .

[17]  K. Mcdougall,et al.  SEMANTIC LOCATION EXTRACTION FROM CROWDSOURCED DATA , 2016 .

[18]  S. Satyanarayana,et al.  An algorithm for identification of natural disaster affected area , 2017, Journal of Big Data.

[19]  Ahmed M. D. E. Hassanein,et al.  A PROPOSED MODEL OF SELECTING FEATURES FOR CLASSIFYING ARABIC TEXT , 2019, Jordanian Journal of Computers and Information Technology.

[20]  Budhendra L. Bhaduri,et al.  Mapping near-real-time power outages from social media , 2018, Int. J. Digit. Earth.

[21]  Tamer E. El-Diraby,et al.  Game-based crowdsourcing to support collaborative customization of the definition of sustainability , 2018, Adv. Eng. Informatics.

[22]  D. Murthy,et al.  Social media processes in disasters: Implications of emergent technology use. , 2017, Social science research.

[23]  Judith Gelernter,et al.  Geo‐parsing Messages from Microtext , 2011, Trans. GIS.

[24]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[25]  Josiane Mothe,et al.  Location extraction from tweets , 2018, Inf. Process. Manag..

[26]  Aytug Onan,et al.  Biomedical Text Categorization Based on Ensemble Pruning and Optimized Topic Modelling , 2018, Comput. Math. Methods Medicine.

[27]  C. Valliyammai,et al.  Information entropy based event detection during disaster in cyber-social networks , 2019 .