论文信息 - Don't follow me: Spam detection in Twitter

Don't follow me: Spam detection in Twitter

The rapidly growing social network Twitter has been infiltrated by large amount of spam. In this paper, a spam detection prototype system is proposed to identify suspicious users on Twitter. A directed social graph model is proposed to explore the “follower” and “friend” relationships among Twitter. Based on Twitter's spam policy, novel content-based features and graph-based features are also proposed to facilitate spam detection. A Web crawler is developed relying on API methods provided by Twitter. Around 25K users, 500K tweets, and 49M follower/friend relationships in total are collected from public available data on Twitter. Bayesian classification algorithm is applied to distinguish the suspicious behaviors from normal ones. I analyze the data set and evaluate the performance of the detection system. Classic evaluation metrics are used to compare the performance of various traditional classification methods. Experiment results show that the Bayesian classifier has the best overall performance in term of F-measure. The trained classifier is also applied to the entire data set. The result shows that the spam detection system can achieve 89% precision.

Alex Hai Wang | A. Wang

[1] Hector Garcia-Molina,et al. Link spam detection based on mass estimation , 2006, VLDB.

[2] Susan T. Dumais,et al. A Bayesian Approach to Filtering Junk E-Mail , 1998, AAAI 1998.

[3] Saurabh Bagchi,et al. Spam detection in voice-over-IP calls through semi-supervised clustering , 2009, 2009 IEEE/IFIP International Conference on Dependable Systems & Networks.

[4] Fabrizio Silvestri,et al. Know your neighbors: web spam detection using the web topology , 2007, SIGIR.

[5] Vladimir I. Levenshtein,et al. Binary codes capable of correcting deletions, insertions, and reversals , 1965 .

[6] Tao Tao,et al. Transductive link spam detection , 2007, AIRWeb '07.

[7] Virgílio A. F. Almeida,et al. Detecting Spammers and Content Promoters in Online Video Social Networks , 2009, IEEE INFOCOM Workshops 2009.

[8] Thomas J. Watson,et al. An empirical study of the naive Bayes classifier , 2001 .

[9] Hector Garcia-Molina,et al. Combating Web Spam with TrustRank , 2004, VLDB.

[10] Vladimir Batagelj,et al. Exploratory Social Network Analysis with Pajek , 2005 .

[11] Balachander Krishnamurthy,et al. A few chirps about twitter , 2008, WOSN '08.

[12] Xinchang Zhang,et al. Link based small sample learning for web spam detection , 2009, WWW '09.

[13] Miriam Whaples,et al. Opera , 1969 .

[14] Alex Hai Wang,et al. Detecting Spam Bots in Online Social Networking Sites: A Machine Learning Approach , 2010, DBSec.