Classifying Information Sources in Arabic Twitter to Support Online Monitoring of Infectious Diseases

There is vast untapped potential in relation to the use of social media for monitoring the spread of infectious diseases around the world. Much previous research has focussed on English only, but the Arabic twitter universe has been comparatively much less studied. Motivated by important issues related to levels of trust, quality and reliability of the information online, here we consider the variety of information sources. As a first step, we find that numerous accounts disseminate information via Arabic social media, and we group them into five types of sources: academic, media, government, health professional, and public. We perform two experiments. First, native speakers judge whether they can manually classify tweets into these five groups, and then we repeat the experiment using various Machine Learning (ML) classifiers. We find that inter-annotator agreement is 0.84 for this task, and ML classifiers are able to correctly identify the type of source of a tweet with 77.2% accuracy without knowledge of the user and their bio or profile, but with 99.9% accuracy when provided with this information.

[1]  Son Doan,et al.  Using Natural Language Processing to Extract Health-Related Causality from Twitter Messages , 2018, 2018 IEEE International Conference on Healthcare Informatics Workshop (ICHI-W).

[2]  Soon Ae Chun,et al.  Knowledge-Based Tweet Classification for Disease Sentiment Monitoring , 2016, Sentiment Analysis and Ontology Engineering.

[3]  Mark Dredze,et al.  You Are What You Tweet: Analyzing Twitter for Public Health , 2011, ICWSM.

[4]  Mihai Surdeanu,et al.  The Stanford CoreNLP Natural Language Processing Toolkit , 2014, ACL.

[5]  Michael J. Paul,et al.  Session Introduction , 2016, PSB.

[6]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[7]  Jessica Y. Breland,et al.  Social Media as a Tool to Increase the Impact of Public Health Research , 2017, American journal of public health.

[8]  Matthew England,et al.  Arabic language sentiment analysis on health services , 2017, 2017 1st International Workshop on Arabic Script Analysis and Recognition (ASAR).

[9]  Richard O. Sinnott,et al.  A Social Media Platform for Infectious Disease Analytics , 2018, ICCSA.

[10]  Antonio Jimeno-Yepes,et al.  Investigating Public Health Surveillance using Twitter , 2015, BioNLP@IJCNLP.

[11]  Kevin A Padrez,et al.  Twitter as a Tool for Health Research: A Systematic Review , 2017, American journal of public health.

[12]  Mizuki Morita,et al.  Twitter Catches The Flu: Detecting Influenza Epidemics using Twitter , 2011, EMNLP.

[13]  Ron Artstein,et al.  Survey Article: Inter-Coder Agreement for Computational Linguistics , 2008, CL.

[14]  Gianluca Demartini,et al.  Moral Panic through the Lens of Twitter: An Analysis of Infectious Disease Outbreaks , 2018, SMSociety.

[15]  Gang Feng,et al.  Disease Ontology: a backbone for disease semantic integration , 2011, Nucleic Acids Res..

[16]  G. Saranya,et al.  Medical Analysis and Visualisation of Diseases using Tweet data , 2017 .

[17]  Evgeny Burnaev,et al.  Influence of resampling on accuracy of imbalanced classification , 2015, International Conference on Machine Vision.

[18]  M. Shigematsu,et al.  Using Social Media for Actionable Disease Surveillance and Outbreak Management: A Systematic Literature Review , 2015, PloS one.

[19]  Soon Ae Chun,et al.  Monitoring Public Health Concerns Using Twitter Sentiment Classifications , 2013, 2013 IEEE International Conference on Healthcare Informatics.

[20]  Khalid A Alnemer,et al.  Are Health-Related Tweets Evidence Based? Review and Analysis of Health-Related Tweets on Twitter , 2015, Journal of Medical Internet Research.

[21]  H. Alsobayel Use of Social Media for Professional Development by Health Care Professionals: A Cross-Sectional Web-Based Survey , 2016, JMIR medical education.