A Semi-automatic Approach for Labeling Large Amounts of Automated and Non-automated Social Media User Accounts

Automated accounts are used for many purposes in social media, including sending spam, spreading of viruses and conducting psychological operations in political or military conflicts. While several previous attempts have been made to classify bot accounts in the spam domain, there are (to the best of our knowledge) no previous studies on detection of automated accounts in a military information operation context. Traditional machine learning approaches to bot detection rely on manual annotation of training sets from which classifiers can be learnt, which requires a large manual effort. We present a semi automated alternative to manual annotation which significantly reduces the effort and resources needed, and hence speeds up the process of adapting classifiers to new domains. Our application of the method to Twitter data from the Russia-Ukraine conflict and our classification results suggest that good classification performance still can be obtained despite generating training sets semi-automatically rather than using manual annotation.