Understanding a happiness dataset: How the machine learning classification accuracy changes with different demographic groups

In this paper, we use the HappyDB (which is a corpus of more than 100,000 happy moments or happiness statements) to train machine learning classifiers to classify the type of happiness statements, i.e., whether they are related to different categories, for example Achievement or Affection. Having identified the best performing classifier, we then sought to assess if the classifier had variable performance when tested using happiness statements from different demographic groups, such as those written by a married or single person, female or male, young or old and whether they are a parent or non-parent. Three different classifiers were initially used in this classification task, to determine classification accuracy. Having determined the best performing model (the convolutional neural network - CNN, deep learning algorithm), this model was then used for further analysis of results per cross sectional demographic groups. The CNN achieved an F1 score of 0.897 but had variable performance when tested on different demographic groups. Generally, we found that accuracy of prediction within this dataset declines with age, where the results for certain sub-groups were declining with increased age or flatlining, except for the single parents' sub-group. This may be due to decreased numbers in these particular sub-groups, where the algorithm did not learn the patterns in the happiness statements for this cohort, due to a sparsity of training data for the sub-group. Results show that there is likely a change in word patterns in happiness statements for different demographics.