Modeling and Evaluating Information Diffusion for Spam Detection in Micro-blogging Networks

Spam has become one of the top threats of micro-blogging networks as the representations of rumor spreading, advertisement abusing and malware distribution. With the increasing popularity of micro-blogging, the problems will exacerbate. Prior detection tools are either designed for specific types of spams or not robust enough. Spammers may escape easily from being detected by adjusting their behaviors. In this paper, we present a novel model to quantitatively evaluate information diffusion in micro-blogging networks. Under this model, we found that spam posts differ wildly from the non-spam ones. First, the propagations of non-spam posts mostly result from their followers, but those of spam posts are mainly from strangers. Second, the non-spam posts relatively last longer than the spam posts. Besides, the non-spam posts always get their first reposts/comments much sooner than the spam posts. With the features defined in our model, we propose an RBF-based approach to detect spams. Different from the previous works, in which the features are extracted from individual profiles or contents, the diffusion features are not determined by any single user but the crowd. Thus, our method is more robust because any single user’s behavior changes will not affect the effectiveness. Besides, although the spams vary in types and forms, they’re propagated in the same way, so our method is effective for all types of spams. With the real data crawled from the leading micro-blogging services of China, we are able to evaluate the effectiveness of our model. The experiment results show that our model can achieve high accuracy both in precision and recall.

[1]  Jun Hu,et al.  Detecting and characterizing social spam campaigns , 2010, IMC '10.

[2]  Haiying Shen,et al.  SOAP: A Social network Aided Personalized and effective spam filter to clean your e-mail box , 2011, 2011 Proceedings IEEE INFOCOM.

[3]  Minoru Uehara,et al.  Multiple Filters of Spam Using Sobel Operators and OCR , 2012, 2012 Sixth International Conference on Complex, Intelligent, and Software Intensive Systems.

[4]  Jure Leskovec,et al.  Modeling Information Diffusion in Implicit Networks , 2010, 2010 IEEE International Conference on Data Mining.

[5]  Noah J. Goldstein,et al.  Social influence: compliance and conformity. , 2004, Annual review of psychology.

[6]  Yiqun Liu,et al.  User behavior oriented web spam detection , 2008, WWW.

[7]  Padmini Srinivasan,et al.  Spam detection in online classified advertisements , 2011, WebQuality '11.

[8]  D. Strang,et al.  DIFFUSION IN ORGANIZATIONS AND SOCIAL MOVEMENTS: From Hybrid Corn to Poison Pills , 1998 .

[9]  Zhu Zhang,et al.  Utility scoring of product reviews , 2006, CIKM '06.

[10]  Xi Chen,et al.  Rumor Propagation in Online Social Networks Like Twitter -- A Simulation Study , 2011, 2011 Third International Conference on Multimedia Information Networking and Security.

[11]  Anand Mahendran,et al.  Comment Spam Classification in Blogs through Comment Analysis and Comment-Blog Post Relationships , 2012, CICLing.

[12]  Jacob Goldenberg,et al.  Talk of the Network: A Complex Systems Look at the Underlying Process of Word-of-Mouth , 2001 .

[13]  Ramanathan V. Guha,et al.  Information diffusion through blogspace , 2004, WWW '04.

[14]  Xianchao Zhang,et al.  Detecting Spam and Promoting Campaigns in the Twitter Social Network , 2012, 2012 IEEE 12th International Conference on Data Mining.

[15]  Ung Mo Kim,et al.  A hierarchical framework for content-based image spam filtering , 2012, 2012 8th International Conference on Information Science and Digital Content Technology (ICIDT2012).

[16]  Vern Paxson,et al.  @spam: the underground on 140 characters or less , 2010, CCS '10.

[17]  Dawn Xiaodong Song,et al.  Suspended accounts in retrospect: an analysis of twitter spam , 2011, IMC '11.

[18]  Steven Myers,et al.  Prevalence and mitigation of forum spamming , 2011, 2011 Proceedings IEEE INFOCOM.

[19]  Xinyu Zhang,et al.  A new approach for detecting spam microblogs based on text and user's social network features , 2014, 2014 4th International Conference on Wireless Communications, Vehicular Technology, Information Theory and Aerospace & Electronic Systems (VITAE).