Application of named entity recognition on tweets during earthquake disaster: a deep learning-based approach

Twitter is an intensely utilized platform for disaster events and emergencies. Therefore, Twitter is an important resource for providing the essential information. Named entity recognition (NER), which is the process of determining the elementary units in a text and classifying them with pre-defined categories, plays a significant role to extract essential and usefulness information. However, NER is a challenging task due to the utilized informal text in the Twitter platform such as grammatical errors and nonstandard abbreviations. In this paper, recurrent neural network (RNN)-based approaches considering diversity of activation functions and optimization functions with NER tools are utilized to extract named entities such as organization, person, and location from the tweets. Inputs for RNN models are provided via two different NER tools which are natural language toolkit (NLTK) and general architecture for text engineering (Gate). Then, pre-labeled data are trained via GloVe word embedding technique, and RNN model variants such as LSTM, BLSTM, and GRU are demonstrated. Therefore, outperforming models among RNN variants are presented for predicting named entities. Yellowbrick interpreter is used for evaluation of the proposed method and Wilcoxon signed-rank test are applied on results of two different data sets to demonstrate consistency of the proposed method. In addition, comparison is made with existing machine learning methods. The experiments by utilizing the Nepal earthquake Twitter data set show that the RNN-based approaches achieve good results in finding named entities. In emergencies, the results of this paper can help in reducing the efforts of event location detection and provide better disaster management.

[1]  L. Javier García-Villalba,et al.  Using Twitter Data to Monitor Natural Disaster Social Dynamics: A Recurrent Neural Network Approach with Word Embeddings and Kernel Density Estimation , 2019, Sensors.

[2]  Wen-Lian Hsu,et al.  A supervised learning approach to biological question answering , 2009, Integr. Comput. Aided Eng..

[3]  Deni Cahya Wintaka,et al.  Named-Entity Recognition on Indonesian Tweets using Bidirectional LSTM-CRF , 2019 .

[4]  Judith Gelernter,et al.  Geo‐parsing Messages from Microtext , 2011, Trans. GIS.

[5]  Ngoc Thanh Nguyen,et al.  A combination of active learning and self-learning for named entity recognition on Twitter using conditional random fields , 2017, Knowl. Based Syst..

[6]  Ryan Gabbard,et al.  Combining rule-based and statistical mechanisms for low-resource named entity recognition , 2018, Machine Translation.

[7]  Budhendra L. Bhaduri,et al.  Mapping near-real-time power outages from social media , 2018, Int. J. Digit. Earth.

[8]  Ioannis Korkontzelos,et al.  A deep semantic search method for random tweets , 2019, Online Soc. Networks Media.

[9]  Zhigang Chen,et al.  An Unsupervised Learning Approach for NER Based on Online Encyclopedia , 2019, APWeb/WAIM.

[10]  Judith Gelernter,et al.  An algorithm for local geoparsing of microtext , 2013, GeoInformatica.

[11]  Yan Wong,et al.  Biodiversity, the Tree of Life, and Science Communication , 2018 .

[12]  Harshad B. Bhadka,et al.  Named Entity Recognition from Gujarati Text Using Rule-Based Approach , 2017, ISDA.

[13]  Türkay Dereli,et al.  Comparison of different machine learning techniques on location extraction by utilizing geo-tagged tweets: A case study , 2020, Adv. Eng. Informatics.

[14]  Jingcheng Du,et al.  Extracting psychiatric stressors for suicide from social media using deep learning , 2018, BMC Medical Informatics and Decision Making.

[15]  Erik F. Tjong Kim Sang,et al.  Introduction to the CoNLL-2002 Shared Task: Language-Independent Named Entity Recognition , 2002, CoNLL.

[16]  Rafal Scherer,et al.  LSTM Recurrent Neural Networks for Short Text and Sentiment Classification , 2017, ICAISC.

[17]  Sumam Mary Idicula,et al.  An Improved Word Representation for Deep Learning Based NER in Indian Languages , 2019, Inf..

[18]  Abhinav Kumar,et al.  Location reference identification from tweets during emergencies: A deep learning approach , 2019, International Journal of Disaster Risk Reduction.

[19]  Halit Oğuztüzün,et al.  Extracting Location Information from Crowd-sourced Social Network Data , 2016 .

[20]  Yinhai Wang,et al.  Stacked Bidirectional and Unidirectional LSTM Recurrent Neural Network for Forecasting Network-wide Traffic State with Missing Values , 2020, Transportation Research Part C: Emerging Technologies.

[21]  Boris Konev,et al.  Ontology Learning from Twitter Data , 2019, KEOD.

[22]  Ernest Mwebaze,et al.  Ontology boosted deep learning for disease name extraction from Twitter messages , 2018, Journal of Big Data.

[23]  Ayu Purwarianti,et al.  Indonesian Named-entity Recognition for 15 Classes Using Ensemble Supervised Learning , 2016, SLTU.

[24]  Maria Kvist,et al.  Automatic recognition of disorders, findings, pharmaceuticals and body structures from clinical text: An annotation and machine learning study , 2014, J. Biomed. Informatics.

[25]  Chihyun Park,et al.  Combinatorial feature embedding based on CNN and LSTM for biomedical named entity recognition , 2020, J. Biomed. Informatics.

[26]  Dong-Hong Ji,et al.  DLocRL: A Deep Learning Pipeline for Fine-Grained Location Recognition and Linking in Tweets , 2019, WWW.

[27]  Tao Yang,et al.  Word Embedding for Understanding Natural Language: A Survey , 2018 .

[28]  Patrick Kenekayoro Identifying named entities in academic biographies with supervised learning , 2018, Scientometrics.

[29]  Sarsij Tripathi,et al.  Active learning approach using a modified least confidence sampling strategy for named entity recognition , 2021, Progress in Artificial Intelligence.

[30]  Kevin Heaslip,et al.  Developing a Twitter-based traffic event detection model using deep learning architectures , 2019, Expert Syst. Appl..

[31]  Simon Fong,et al.  Multi-stage optimization of a deep model: A case study on ground motion modeling , 2018, PloS one.

[32]  Josiane Mothe,et al.  Location extraction from tweets , 2018, Inf. Process. Manag..

[33]  L. Sobha,et al.  Named Entity Recognition for Kannada using Gazetteers list with Conditional Random Fields , 2018, J. Comput. Sci..

[34]  Khaled Shaalan,et al.  NERA 2.0: Improving coverage and performance of rule-based named entity recognition for Arabic* , 2016, Natural Language Engineering.

[35]  Mohd Juzaiddin Ab Aziz,et al.  Arabic Person Names Recognition by using a Rule based Approach , 2013, J. Comput. Sci..

[36]  Chris Hankin,et al.  Real-time processing of social media with SENTINEL: A syndromic surveillance system incorporating deep learning for health classification , 2019, Inf. Process. Manag..

[37]  Sophia Ananiadou,et al.  A Text Mining Pipeline Using Active and Deep Learning Aimed at Curating Information in Computational Neuroscience , 2018, Neuroinformatics.

[38]  Benjamin Bengfort,et al.  Yellowbrick: Visualizing the Scikit-Learn Model Selection Process , 2019, J. Open Source Softw..

[39]  Kalina Bontcheva,et al.  Architectural elements of language engineering robustness , 2002, Natural Language Engineering.

[40]  Keun Ho Ryu,et al.  Ontology-Based Healthcare Named Entity Recognition from Twitter Messages Using a Recurrent Neural Network Approach , 2019, International journal of environmental research and public health.

[41]  Tome Eftimov,et al.  A rule-based named-entity recognition method for knowledge extraction of evidence-based dietary recommendations , 2017, PloS one.

[42]  Content analyses of the international federation of red cross and red crescent societies (ifrc) based on machine learning techniques through twitter , 2021, Natural Hazards.