Pedagogical Demonstration of Twitter Data Analysis: A Case Study of World AIDS Day, 2014

As a pedagogical demonstration of Twitter data analysis, a case study of HIV/AIDS-related tweets around World AIDS Day, 2014, was presented. This study examined if Twitter users from countries with various income levels responded differently to World AIDS Day. The performance of support vector machine (SVM) models as classifiers of relevant tweets was evaluated. A manual coding of 1,826 randomly sampled HIV/AIDS-related original tweets from November 30 through December 2, 2014 was completed. Logistic regression was applied to analyze the association between the World Bank-designated income level of users’ self-reported countries and Twitter contents. To identify the optimal SVM model, 1278 (70%) of the 1826 sampled tweets were randomly selected as the training set, and 548 (30%) served as the test set. Another 180 tweets were separately sampled and coded as the held-out dataset. Compared with tweets from low-income countries, tweets from the Organization for Economic Cooperation and Development countries had 60% lower odds to mention epidemiology (adjusted odds ratio, aOR = 0.404; 95% CI: 0.166, 0.981) and three times the odds to mention compassion/support (aOR = 3.080; 95% CI: 1.179, 8.047). Tweets from lower-middle-income countries had 79% lower odds than tweets from low-income countries to mention HIV-affected sub-populations (aOR = 0.213; 95% CI: 0.068, 0.664). The optimal SVM model was able to identify relevant tweets from the held-out dataset of 180 tweets with an accuracy (F1 score) of 0.72. This study demonstrated how students can be taught to analyze Twitter data using manual coding, regression models, and SVM models.

[1]  Zion Tsz Ho Tse,et al.  The use of social media in public health surveillance. , 2015, Western Pacific surveillance and response journal : WPSAR.

[2]  Hai Liang,et al.  Sentiment, Contents, and Retweets: A Study of Two Vaccine-Related Twitter Datasets. , 2018, The Permanente journal.

[3]  T. Coates,et al.  Project HOPE: online social network changes in an HIV prevention randomized controlled trial for African American and Latino men who have sex with men. , 2014, American journal of public health.

[4]  Zion Tsz Ho Tse,et al.  How people react to Zika virus outbreaks on Twitter? A computational content analysis. , 2016, American journal of infection control.

[5]  #CDCGrandRounds and #VitalSigns: A Twitter Analysis , 2018, Annals of Global Health.

[6]  B. Lewis,et al.  Methods of using real-time social media technologies for detection and remote monitoring of HIV outcomes. , 2014, Preventive medicine.

[7]  Zion Tsz Ho Tse,et al.  Converting Big Data into public health. , 2015, Science.

[8]  Hai Liang,et al.  Twitter Conversations and English News Media Reports on Poliomyelitis in Five Different Countries, January 2014 to April 2015. , 2019, The Permanente journal.

[9]  Hai Liang,et al.  World Pneumonia Day 2011-2016: Twitter contents and retweets. , 2018, International health.

[10]  Hai Liang,et al.  #Globalhealth Twitter Conversations on #Malaria, #HIV, #TB, #NCDS, and #NTDS: a Cross-Sectional Analysis. , 2017, Annals of global health.

[11]  Kevin A Padrez,et al.  Twitter as a Tool for Health Research: A Systematic Review , 2017, American journal of public health.

[12]  Yoram Bachrach,et al.  Studying User Income through Language, Behaviour and Affect in Social Media , 2015, PloS one.

[13]  Zion Tsz Ho Tse,et al.  Social Media's Initial Reaction to Information and Misinformation on Ebola, August 2014: Facts and Rumors , 2016, Public health reports.

[14]  Graeme Hirst,et al.  Utility of social media and crowd-intelligence data for pharmacovigilance: a scoping review , 2018, BMC Medical Informatics and Decision Making.

[15]  Z. Tse,et al.  Contents, Followers, and Retweets of the Centers for Disease Control and Prevention’s Office of Advanced Molecular Detection (@CDC_AMD) Twitter Profile: Cross-Sectional Study , 2018, JMIR public health and surveillance.

[16]  Zion Tsz Ho Tse,et al.  Using Twitter for Public Health Surveillance from Monitoring and Prediction to Public Response , 2018, Data.

[17]  Fei Shen,et al.  Privacy protection and self-disclosure across societies: A study of global Twitter users , 2017, New Media Soc..

[18]  Peter Dalgaard,et al.  R Development Core Team (2010): R: A language and environment for statistical computing , 2010 .

[19]  K. Fu,et al.  Global Handwashing Day 2012: a qualitative content analysis of Chinese social media reaction to a health promotion event. , 2015, Western Pacific surveillance and response journal : WPSAR.