论文信息 - A Semi-automatic Approach for Labeling Large Amounts of Automated and Non-automated Social Media User Accounts

A Semi-automatic Approach for Labeling Large Amounts of Automated and Non-automated Social Media User Accounts

Automated accounts are used for many purposes in social media, including sending spam, spreading of viruses and conducting psychological operations in political or military conflicts. While several previous attempts have been made to classify bot accounts in the spam domain, there are (to the best of our knowledge) no previous studies on detection of automated accounts in a military information operation context. Traditional machine learning approaches to bot detection rely on manual annotation of training sets from which classifiers can be learnt, which requires a large manual effort. We present a semi automated alternative to manual annotation which significantly reduces the effort and resources needed, and hence speeds up the process of adapting classifiers to new domains. Our application of the method to Twitter data from the Russia-Ukraine conflict and our classification results suggest that good classification performance still can be obtained despite generating training sets semi-automatically rather than using manual annotation.

[1] Alex Hai Wang,et al. Detecting Spam Bots in Online Social Networking Sites: A Machine Learning Approach , 2010, DBSec.

[2] Steven Gianvecchio,et al. Measurement and Classification of Humans and Bots in Internet Chat , 2008, USENIX Security Symposium.

[3] Gerald S. Russell,et al. An expanded table of probability values for rao's spacing test , 1995 .

[4] Chao Yang,et al. CATS: Characterizing automation of Twitter spammers , 2013, 2013 Fifth International Conference on Communication Systems and Networks (COMSNETS).

[5] Sushil Jajodia,et al. Detecting Automation of Twitter Accounts: Are You a Human, Bot, or Cyborg? , 2012, IEEE Transactions on Dependable and Secure Computing.

[6] Virgílio A. F. Almeida,et al. Detecting Spammers on Twitter , 2010 .

[7] Vern Paxson,et al. @spam: the underground on 140 characters or less , 2010, CCS '10.

[8] Vern Paxson,et al. Detecting and Analyzing Automated Activity on Twitter , 2011, PAM.