Tweets Classification with BERT in the Field of Disaster Management

Crisis informatics focus on the contribution of user generated content (UGC) to disaster management. To leverage the social media data effectively, it is crucial to filter out noisy information from the large volume of data flow so that we could better estimate disaster damage with these data. Not satisfied with basic keyword-based filtration, many researchers turn to machine learning for solution. In this project, I apply deep learning techniques to address Tweets classification problem in disaster management field. The labels of Tweets reflect different types of disaster-related information, which have different potential usage in emergency response. In particular, BERT is used for transfer learning. The standard BERT architecture for classification and several other customized BERT architectures are trained to compare with the baseline bidirectional LSTM with pretrained Glove Twitter embeddings. Results show that BERT and BERT-based LSTM attain the best results, outperforming the baseline model by 3.29% on average in terms of F-1 score respectively. Ambiguity and subjectivity affect the performance of these models considerably. In some examples the models can surpass human performance.

[1]  Fernando Diaz,et al.  Extracting information nuggets from disaster- Related messages in social media , 2013, ISCRAM.

[2]  Pascal Van Hentenryck,et al.  Performance of Social Network Sensors during Hurricane Sandy , 2014, PloS one.

[3]  Stefan Stieglitz,et al.  Sense‐Making in Social Media During Extreme Events , 2018 .

[4]  Michelle R. Guy,et al.  Twitter earthquake detection: earthquake monitoring in a social world , 2012 .

[5]  Firoj Alam,et al.  Graph Based Semi-supervised Learning with Convolution Neural Networks to Classify Crisis Related Tweets , 2018, ICWSM.

[6]  Cornelia Caragea,et al.  Disaster Response Aided by Tweet Classification with a Domain Adaptation Approach , 2018 .

[7]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[8]  Christian Reuter,et al.  Retrospective Review and Future Directions for Crisis Informatics , 2021, Information Refinement Technologies for Crisis Informatics.

[9]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[10]  L. Burks,et al.  RAPID ESTIMATE OF GROUND SHAKING INTENSITY BY COMBINING SIMPLE EARTHQUAKE CHARACTERISTICS WITH TWEETS , 2014 .

[11]  C. Havas,et al.  Combining machine-learning topic models and spatiotemporal analysis of social media data for disaster footprint and damage assessment , 2018 .

[12]  Muhammad Imran,et al.  Twitter as a Lifeline: Human-annotated Twitter Corpora for NLP of Crisis-related Messages , 2016, LREC.

[13]  Leysia Palen,et al.  Improving Classification of Twitter Behavior During Hurricane Events , 2018, SocialNLP@ACL.

[14]  Fernando Diaz,et al.  CrisisLex: A Lexicon for Collecting and Filtering Microblogged Communications in Crises , 2014, ICWSM.

[15]  Danaë Metaxa-Kakavouli,et al.  How Social Ties Influence Hurricane Evacuation Behavior , 2018, Proc. ACM Hum. Comput. Interact..