Capturing Public Concerns About Coronavirus Using Arabic Tweets: An NLP-Driven Approach

This In order to analyze the people reactions and opinions about Coronavirus (COVID-19), there is a need for computational framework, which leverages machine learning (ML) and natural language processing (NLP) techniques to identify COVID tweets and further categorize these in to disease specific feelings to address societal concerns related to Safety, Worriedness, and Irony of COVID. This is an ongoing study, and the purpose of this paper is to demonstrate the initial results of determining the relevancy of the tweets and what Arabic speaking people were tweeting about the three disease related feelings/emotions about COVID: Safety, Worry, and Irony. A combination of ML and NLP techniques are used for determining what Arabic speaking people are tweeting about COVID. A two-stage classifier system was built to find relevant tweets about COVID, and then the tweets were categorized into three categories. Results indicated that the number of tweets by males and females were similar. The classification performance was high for relevancy (F=0.85), categorization (F=0.79). Our study has demonstrated how categories of discussion on Twitter about an epidemic can be discovered so that officials can understand specific societal concerns related to the emotions and feelings related to the epidemic.

[1]  Shervin Malmasi,et al.  Arabic Dialect Identification in Speech Transcripts , 2016, VarDial@COLING.

[2]  Chen Chen,et al.  Public discourse and sentiment during the COVID 19 pandemic: Using Latent Dirichlet Allocation for topic modeling on Twitter , 2020, PloS one.

[3]  Tianxiao Li,et al.  What are We Depressed about When We Talk about COVID19: Mental Health Analysis on Tweets Using Natural Language Processing , 2020, SGAI Conf..

[4]  Mark Lycett,et al.  Identifying patient experience from online resources via sentiment analysis and topic modelling , 2016, BDCAT.

[5]  Mohammed Hassouna,et al.  Talk2Learn: A Framework for Chatbot Learning , 2019, EC-TEL.

[6]  Bogdan Babych,et al.  Improving Machine Translation Quality with Automatic Named Entity Recognition , 2003, Proceedings of the 7th International EAMT workshop on MT and other Language Technology Tools, Improving MT through other Language Technology Tools Resources and Tools for Building MT - EAMT '03.

[7]  J. R. Landis,et al.  The measurement of observer agreement for categorical data. , 1977, Biometrics.

[8]  Md. Mokhlesur Rahman,et al.  COVID-19 Public Sentiment Insights and MachineLearning for Tweets Classification , 2020, medRxiv.

[9]  Gérard Chollet,et al.  Explaining Sentiment Classification , 2019, INTERSPEECH.

[10]  Hazem M. Hajj,et al.  Improved Generalization of Arabic Text Classifiers , 2019, WANLP@ACL 2019.

[11]  G. Heinze,et al.  Prediction models for diagnosis and prognosis of covid-19: systematic review and critical appraisal , 2020, BMJ.

[12]  อนิรุธ สืบสิงห์,et al.  Data Mining Practical Machine Learning Tools and Techniques , 2014 .

[14]  Samhaa R. El-Beltagy,et al.  AraVec: A set of Arabic Word Embedding Models for use in Arabic NLP , 2017, ACLING.

[15]  Hassan Sajjad,et al.  Verifiably Effective Arabic Dialect Identification , 2014, EMNLP.

[16]  A. Porwal,et al.  A vulnerability index for the management of and response to the COVID-19 epidemic in India: an ecological study , 2020, The Lancet Global Health.

[17]  Huixia Yang,et al.  Clinical characteristics and intrauterine vertical transmission potential of COVID-19 infection in nine pregnant women: a retrospective review of medical records , 2020, The Lancet.

[18]  Shervin Malmasi,et al.  Arabic Dialect Identification Using a Parallel Multidialectal Corpus , 2015, PACLING.

[19]  Khaled Shaalan,et al.  Challenges in Arabic Natural Language Processing , 2018, Computational Linguistics, Speech and Image Processing for Arabic Language.

[20]  Peiguang Lin,et al.  Analysis and Design of Internet Monitoring System on Public Opinion Based on Cloud Computing and NLP , 2012, WISM.

[21]  J. Lieberman,et al.  How mental health care should change as a consequence of the COVID-19 pandemic , 2020, The Lancet Psychiatry.

[22]  Pablo N. Mendes,et al.  Twitris 2.0 : Semantically Empowered System for Understanding Perceptions From Social Data , 2010 .

[23]  Ching Y. Suen,et al.  Computational Linguistics, Speech and Image Processing for Arabic Language , 2019 .

[24]  Yaser Al-Onaizan,et al.  Improved Sentence-Level Arabic Dialect Classification , 2014, VarDial@COLING.