论文信息 - Scam Detection in Twitter

Scam Detection in Twitter

Twitter is one among the fastest growing social networking services.This growth has led to an increase in Twitter scams (e.g., intentional deception). There is relatively little effort in identifying scams in Twitter. In this chapter, we propose a semi-supervised Twitter scam detector based on a small labeled data. The scam detector combines self-learning and clustering analysis. A suffix tree data structure is used. Model building based on Akaike and Bayes Information Criteria is investigated and combined with the classification step. Our experiments show that 87 % accuracy is achievable with only 9 labeled samples and 4000 unlabeled samples, among other results.

[1] Mark Levene,et al. A suffix tree approach to anti-spam email filtering , 2006, Machine Learning.

[2] Danah Boyd,et al. Detecting Spam in a Twitter Network , 2009, First Monday.

[3] Tom M. Mitchell,et al. Semi-Supervised Text Classification Using EM , 2006, Semi-Supervised Learning.

[4] Kyumin Lee,et al. Uncovering social spammers: social honeypots + machine learning , 2010, SIGIR.

[5] Chris Moore,et al. Sharing music files: Tactics of a challenge to the industry , 2010, First Monday.

[6] Alex Hai Wang,et al. Don't follow me: Spam detection in Twitter , 2010, 2010 International Conference on Security and Cryptography (SECRYPT).

[7] Zhi-Hua Zhou,et al. Training SpamAssassin with Active Semi-supervised Learning , 2009, CEAS 2009.

[8] Alexander Zien,et al. Semi-Supervised Learning , 2006 .

[9] Adrian E. Raftery,et al. How Many Clusters? Which Clustering Method? Answers Via Model-Based Cluster Analysis , 1998, Comput. J..

[10] David J. Brenes,et al. Overcoming Spammers in Twitter – A Tale of Five Algorithms1 , 2010 .

[11] Xiaojin Zhu,et al. --1 CONTENTS , 2006 .

[12] Virgílio A. F. Almeida,et al. Detecting Spammers on Twitter , 2010 .