Detecting cooperative and organized spammer groups in micro-blogging community

In recent years, social spammers become rampant and evolve a number of variations in most social networks. In micro-blogging community, there are a typical type of anomalous groups consisting of cooperative and organized spammers, and they are hired by public relation companies and paid for posting tweets with certain content. They intentionally evolve their content and behavior patterns to prevent them from being detected, and cooperatively hijack the trending topics with a deliberate point of view which would affect people’s judgments and decisions seriously. Due to the evolving nature and hidden behavior of this type of spammers, we have to deal with two important issues to solve the problem of detecting this type of spammer groups. One is to detect the anomalous topics hijacked by spammer groups from numerous trending topics. Another is to detect the members of spammer group from the users joining anomalous topics. In this paper, we propose a two-stage topology-based method to detect spammer groups partially distributed in multiple trending topics. In the first stage, we detect the anomalous topics from plenty of trending topics according to a new similarity measure based on subgraph ranking. A topic is identified as anomalous if the topology characteristics of retweeting networks between adjacent periods change dramatically. In the second stage, we obtain several anomalous topic sequences through a few initial labeled spammers by employing the basic idea of label propagation, and cluster the users who join each topic sequence into group spammers and normal users by their total authorities. The total authority of user is his/her weighted cumulative authorities in anomalous topics of each topic sequence, and authority in each topic is defined based on the out-degree of user in the retweeting network. The experimental results based on real-world data collected from Sina micro-blogging site demonstrate that our similarity measure keeps a leading performance in all evaluation metrics, and our method can effectively detect the group spammers compared with other methods.

[1]  Chong Long,et al.  Uncovering collusive spammers in Chinese review websites , 2013, CIKM.

[2]  Venkata Rama Kiran Garimella,et al.  Political hashtag hijacking in the U.S. , 2013, WWW.

[3]  Jun Hu,et al.  Detecting and characterizing social spam campaigns , 2010, IMC '10.

[4]  Philip S. Yu,et al.  Review Graph Based Online Store Review Spammer Detection , 2011, 2011 IEEE 11th International Conference on Data Mining.

[5]  Mitsuru Ishizuka,et al.  Topic extraction from news archive using TF*PDF algorithm , 2002, Proceedings of the Third International Conference on Web Information Systems Engineering, 2002. WISE 2002..

[6]  Huan Liu,et al.  Social Spammer Detection in Microblogging , 2013, IJCAI.

[7]  Krishna P. Gummadi,et al.  Understanding and combating link farming in the twitter social network , 2012, WWW.

[8]  Haining Wang,et al.  Detecting Social Spam Campaigns on Twitter , 2012, ACNS.

[9]  Xiaohong Guan,et al.  Modeling and reproducing retweeting dynamics in micro-blogging social networks , 2014, Proceeding of the 11th World Congress on Intelligent Control and Automation.

[10]  M. Chuah,et al.  Spam Detection on Twitter Using Traditional Classifiers , 2011, ATC.

[11]  Christos Faloutsos,et al.  A General Suspiciousness Metric for Dense Blocks in Multimodal Data , 2015, 2015 IEEE International Conference on Data Mining.

[12]  Bing Liu,et al.  Opinion spam and analysis , 2008, WSDM '08.

[13]  Christos Faloutsos,et al.  Suspicious Behavior Detection: Current Trends and Future Directions , 2016, IEEE Intelligent Systems.

[14]  Srinivasan Venkatesh,et al.  Battling the Internet water army: Detection of hidden paid posters , 2011, 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2013).

[15]  Huan Liu,et al.  Social Spammer Detection with Sentiment Information , 2014, 2014 IEEE International Conference on Data Mining.

[16]  Leman Akoglu,et al.  Collective Opinion Spam Detection using Active Inference , 2016, SDM.

[17]  A. Banerjee,et al.  A Simple Model of Herd Behavior , 1992 .

[18]  Albert,et al.  Emergence of scaling in random networks , 1999, Science.

[19]  Kyumin Lee,et al.  Uncovering social spammers: social honeypots + machine learning , 2010, SIGIR.

[20]  Jun Hu,et al.  Detecting and characterizing social spam campaigns , 2010, CCS '10.

[21]  Virgílio A. F. Almeida,et al.  Detecting Spammers on Twitter , 2010 .

[22]  Ee-Peng Lim,et al.  Detecting anomaly collections using extreme feature ranks , 2014, Data Mining and Knowledge Discovery.

[23]  Juan Hu,et al.  Topical authority propagation on microblogs , 2013, CIKM.

[24]  Pang-Ning Tan,et al.  Detecting hashtag hijacking from Twitter , 2016, WebSci.

[25]  S. P. Lloyd,et al.  Least squares quantization in PCM , 1982, IEEE Trans. Inf. Theory.

[26]  Arjun Mukherjee,et al.  Spotting fake reviewer groups in consumer reviews , 2012, WWW.

[27]  Christos Faloutsos,et al.  CatchSync: catching synchronized behavior in large directed graphs , 2014, KDD.

[28]  Huan Liu,et al.  Online Social Spammer Detection , 2014, AAAI.

[29]  Horst Bunke,et al.  A Graph-Theoretic Approach to Enterprise Network Dynamics (Progress in Computer Science and Applied Logic (PCS)) , 2006 .

[30]  Zoubin Ghahramani,et al.  Learning from labeled and unlabeled data with label propagation , 2002 .

[31]  Gianluca Stringhini,et al.  Detecting spammers on social networks , 2010, ACSAC '10.

[32]  Munmun De Choudhury,et al.  Inferring relevant social networks from interpersonal communication , 2010, WWW '10.

[33]  Ee-Peng Lim,et al.  Detecting product review spammers using rating behaviors , 2010, CIKM.

[34]  Hosung Park,et al.  What is Twitter, a social network or a news media? , 2010, WWW '10.

[35]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[36]  Gordon V. Cormack,et al.  Feature engineering for mobile (SMS) spam filtering , 2007, SIGIR.

[37]  Arvind Krishnamurthy,et al.  Studying Spamming Botnets Using Botlab , 2009, NSDI.

[38]  Hector Garcia-Molina,et al.  Web graph similarity for anomaly detection , 2010, Journal of Internet Services and Applications.

[39]  Vern Paxson,et al.  @spam: the underground on 140 characters or less , 2010, CCS '10.

[40]  Ken-ichi Kawarabayashi,et al.  Real-Time Top-R Topic Detection on Twitter with Topic Hijack Filtering , 2015, KDD.

[41]  W. Wallis,et al.  A Graph-Theoretic Approach to Enterprise Network Dynamics , 2006 .

[42]  Huan Liu,et al.  Leveraging knowledge across media for spammer detection in microblogging , 2014, SIGIR.

[43]  Hinrich Schütze,et al.  Introduction to information retrieval , 2008 .

[44]  Ronald Fagin,et al.  Comparing top k lists , 2003, SODA '03.

[45]  Leman Akoglu,et al.  Discovering Opinion Spammer Groups by Network Footprints , 2015, ECML/PKDD.

[46]  Christos Faloutsos,et al.  BIRDNEST: Bayesian Inference for Ratings-Fraud Detection , 2015, SDM.