Analyzing COVID-19 Tweets using Health Behaviour Theories and Machine Learning

In order to explain people's health habits, Health Behaviour Theories have been used to analyze posts on social media during previous incidents. Regarding the COVID-19 pandemic, social media data can expose public attitudes and experiences, as well as reveal elements that impede or encourage attempts to reduce the spread of the disease. This paper aims to use Health Behaviour Theories (Health Belief Model, Social Norm, and Trust) and Machine Learning to investigate or examine people's behaviours and reactions toward COVID-19. First, we extract COVID-19 comments on Twitter and use candidate keyphrases representing each health behaviour construct to label the comments. Next, we develop three machine learning models/classifiers - Support Vector Machine (SVM), Decision Tree (DT) and Logistic Regression (LR) - to automatically classify comments into appropriate constructs. We train and evaluate the models using 10-fold cross-validation and compare their performance based on precision, recall, and Fl-score metrics. Our results show that DT and SVM perform best with an overall Fl-score of up to 98% for multiclass (single label) classification, while DT outperform other classifiers with an overall Fl-score of up to 100% for multiclass-multilabel classification. Finally, we conduct thematic analysis of the comments in each construct to identify meaningful themes that represent key issues related to the COVID-19 pandemic. Our findings reveal 31 themes across all constructs.