Spam Filtering in Twitter Using Sender-Receiver Relationship

Twitter is one of the most visited sites in these days. Twitter spam, however, is constantly increasing. Since Twitter spam is different from traditional spam such as email and blog spam, conventional spam filtering methods are inappropriate to detect it. Thus, many researchers have proposed schemes to detect spammers in Twitter. These schemes are based on the features of spam accounts such as content similarity, age and the ratio of URLs. However, there are two significant problems in using account features to detect spam. First, account features can easily be fabricated by spammers. Second, account features cannot be collected until a number of malicious activities have been done by spammers. This means that spammers will be detected only after they send a number of spam messages. In this paper, we propose a novel spam filtering system that detects spam messages in Twitter. Instead of using account features, we use relation features, such as the distance and connectivity between a message sender and a message receiver, to decide whether the current message is spam or not. Unlike account features, relation features are difficult for spammers to manipulate and can be collected immediately. We collected a large number of spam and non-spam Twitter messages, and then built and compared several classifiers. From our analysis we found that most spam comes from an account that has less relation with a receiver. Also, we show that our scheme is more suitable to detect Twitter spam than the previous schemes.

[1]  O. Perron Zur Theorie der Matrices , 1907 .

[2]  K. Menger Zur allgemeinen Kurventheorie , 1927 .

[3]  James P. Keener,et al.  The Perron-Frobenius Theorem and the Ranking of Football Teams , 1993, SIAM Rev..

[4]  Susan T. Dumais,et al.  A Bayesian Approach to Filtering Junk E-Mail , 1998, AAAI 1998.

[5]  R. Aharoni,et al.  Menger’s theorem for infinite graphs , 2005, math/0509397.

[6]  Amy Nicole Langville,et al.  A Survey of Eigenvector Methods for Web Information Retrieval , 2005, SIAM Rev..

[7]  Marc Najork,et al.  Detecting spam web pages through content analysis , 2006, WWW '06.

[8]  Michael Kaminsky,et al.  SybilGuard: defending against sybil attacks via social networks , 2006, SIGCOMM.

[9]  Calton Pu,et al.  Social Honeypots: Making Friends With A Spammer Near You , 2008, CEAS.

[10]  Ciro Cattuto,et al.  Social spam detection , 2009, AIRWeb '09.

[11]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[12]  Lawrence K. Saul,et al.  Identifying suspicious URLs: an application of large-scale online learning , 2009, ICML '09.

[13]  Enrico Blanzieri,et al.  A survey of learning-based techniques of email spam filtering , 2008, Artificial Intelligence Review.

[14]  Danah Boyd,et al.  Detecting Spam in a Twitter Network , 2009, First Monday.

[15]  Sushil Jajodia,et al.  Who is tweeting on Twitter: human, bot, or cyborg? , 2010, ACSAC '10.

[16]  Kyumin Lee,et al.  Uncovering social spammers: social honeypots + machine learning , 2010, SIGIR.

[17]  Vern Paxson,et al.  @spam: the underground on 140 characters or less , 2010, CCS '10.

[18]  David J. Brenes,et al.  Overcoming Spammers in Twitter – A Tale of Five Algorithms1 , 2010 .

[19]  Alex Hai Wang,et al.  Don't follow me: Spam detection in Twitter , 2010, 2010 International Conference on Security and Cryptography (SECRYPT).

[20]  Hosung Park,et al.  What is Twitter, a social network or a news media? , 2010, WWW '10.

[21]  Virgílio A. F. Almeida,et al.  Detecting Spammers on Twitter , 2010 .

[22]  Feng Xiao,et al.  SybilLimit: A Near-Optimal Social Network Defense Against Sybil Attacks , 2010, IEEE/ACM Trans. Netw..

[23]  Gianluca Stringhini,et al.  Detecting spammers on social networks , 2010, ACSAC '10.

[24]  Dawn Xiaodong Song,et al.  Design and Evaluation of a Real-Time URL Spam Filtering Service , 2011, 2011 IEEE Symposium on Security and Privacy.