Discovering social spammers from multiple views

Online social networks have become popular platforms for spammers to spread malicious content and links. Existing state-of-the-art optimization methods mainly use one kind of user-generated information (i.e., single view) to learn a classification model for identifying spammers. Due to the diversity and variability of spammers' strategies, spammers' behavior may not be completely characterized only by single view's information. To tackle this challenge, we first statistically analyze the importance of considering multiple view information for spammer detection task on a large real-world Twitter dataset. Accordingly, we propose a generalized social spammer detection framework by jointly integrating multiple view information and a novel social regularization term into a classification model. To keep the completeness of the original dataset and detect more spammers by the proposed method, we introduce a simple strategy to fill the missing data for each view. Experimental results on a real-world Twitter dataset show that the proposed method outperforms the existing methods significantly. HighlightsWe propose a generalized social spammer detection framework.The framework integrates multiple view information and a novel social regularization.Results on a real-world dataset demonstrates the effectiveness of the framework.

[1]  H. Sebastian Seung,et al.  Algorithms for Non-negative Matrix Factorization , 2000, NIPS.

[2]  Huan Liu,et al.  Social Spammer Detection in Microblogging , 2013, IJCAI.

[3]  Huan Liu,et al.  Online Social Spammer Detection , 2014, AAAI.

[4]  Christos Faloutsos,et al.  CatchSync: catching synchronized behavior in large directed graphs , 2014, KDD.

[5]  Krishna P. Gummadi,et al.  Understanding and combating link farming in the twitter social network , 2012, WWW.

[6]  Christos Faloutsos,et al.  Inferring lockstep behavior from connectivity pattern in large graphs , 2016, Knowledge and Information Systems.

[7]  Huan Liu,et al.  Leveraging knowledge across media for spammer detection in microblogging , 2014, SIGIR.

[8]  Kyumin Lee,et al.  Seven Months with the Devils: A Long-Term Study of Content Polluters on Twitter , 2011, ICWSM.

[9]  M. McPherson,et al.  Birds of a Feather: Homophily in Social Networks , 2001 .

[10]  Haining Wang,et al.  Detecting Social Spam Campaigns on Twitter , 2012, ACNS.

[11]  Kyumin Lee,et al.  Uncovering social spammers: social honeypots + machine learning , 2010, SIGIR.

[12]  Qiang Fu,et al.  Leveraging Behavior Diversity to Detect Spammers in Online Social Networks , 2015, ICA3PP.

[13]  Chao Yang,et al.  CATS: Characterizing automation of Twitter spammers , 2013, 2013 Fifth International Conference on Communication Systems and Networks (COMSNETS).

[14]  Hannu Toivonen,et al.  Data Mining In Bioinformatics , 2005 .

[15]  Yan Jia,et al.  Predicting the topic influence trends in social media with multiple models , 2014, Neurocomputing.

[16]  Xiang Zhu,et al.  Spammer Detection on Online Social Networks Based on Logistic Regression , 2015, WAIM Workshops.

[17]  Zengyou He,et al.  A Semi-Supervised Framework for Social Spammer Detection , 2015, PAKDD.

[18]  Fangzhao Wu,et al.  Co-detecting social spammers and spam messages in microblogging via exploiting social contexts , 2016, Neurocomputing.

[19]  Qi He,et al.  TwitterRank: finding topic-sensitive influential twitterers , 2010, WSDM '10.

[20]  Virgílio A. F. Almeida,et al.  Detecting Spammers on Twitter , 2010 .

[21]  Behrouz Minaei-Bidgoli,et al.  Multi-View Learning for Web Spam Detection , 2013, ArXiv.

[22]  Hosung Park,et al.  What is Twitter, a social network or a news media? , 2010, WWW '10.

[23]  Christos Faloutsos,et al.  Spotting Suspicious Behaviors in Multimodal Data: A General Metric and Algorithms , 2016, IEEE Transactions on Knowledge and Data Engineering.

[24]  Jiawei Han,et al.  Multi-View Clustering via Joint Nonnegative Matrix Factorization , 2013, SDM.

[26]  Xianchao Zhang,et al.  Detecting Spam and Promoting Campaigns in the Twitter Social Network , 2012, 2012 IEEE 12th International Conference on Data Mining.

[27]  Michael Sirivianos,et al.  Aiding the Detection of Fake Accounts in Large Scale Social Online Services , 2012, NSDI.

[28]  Chao Yang,et al.  Empirical Evaluation and New Design for Fighting Evolving Twitter Spammers , 2011, IEEE Transactions on Information Forensics and Security.

[29]  Guofei Gu,et al.  Analyzing spammers' social networks for fun and profit: a case study of cyber criminal ecosystem on twitter , 2012, WWW.

[30]  James Caverlee,et al.  Detecting Spam URLs in Social Media via Behavioral Analysis , 2015, ECIR.

[31]  Patrick P. K. Chan,et al.  Spam filtering for short messages in adversarial environment , 2015, Neurocomputing.

[32]  Shao-Yuan Li,et al.  Partial Multi-View Clustering , 2014, AAAI.

[33]  Qiang Yang,et al.  Discovering Spammers in Social Networks , 2012, AAAI.

[34]  Zheyi Chen,et al.  Detecting spammers on social networks , 2015, Neurocomputing.

[35]  Lifeng Sun,et al.  Who should share what?: item-level social influence prediction for users and posts ranking , 2011, SIGIR.

[36]  Christos Faloutsos,et al.  Suspicious Behavior Detection: Current Trends and Future Directions , 2016, IEEE Intelligent Systems.

[37]  Huan Liu,et al.  Social Spammer Detection with Sentiment Information , 2014, 2014 IEEE International Conference on Data Mining.

[38]  Derek Greene,et al.  A Matrix Factorization Approach for Integrating Multiple Data Views , 2009, ECML/PKDD.

[39]  James R. Foulds,et al.  Collective Spammer Detection in Evolving Multi-Relational Social Networks , 2015, KDD.

[40]  Steffen Bickel,et al.  Multi-view clustering , 2004, Fourth IEEE International Conference on Data Mining (ICDM'04).

[41]  Jiawei Han,et al.  A Matrix Factorization Method for Clustering in Heterogeneous Information Networks , 2013 .

[42]  Christian Bauckhage,et al.  Non-negative Matrix Factorization in Multimodality Data for Segmentation and Label Prediction , 2011 .

[43]  Georgia Koutrika,et al.  Fighting Spam on Social Web Sites: A Survey of Approaches and Future Challenges , 2007, IEEE Internet Computing.

[44]  Rizal Setya Perdana What is Twitter , 2013 .

[45]  Zhiwu Lu,et al.  Community Based Spammer Detection in Social Networks , 2015, WAIM.

[46]  Muhammad Abulaish,et al.  A generic statistical approach for spam detection in Online Social Networks , 2013, Comput. Commun..