Online Social Spammer Detection

The explosive use of social media also makes it a popular platform for malicious users, known as social spammers, to overwhelm normal users with unwanted content. One effective way for social spammer detection is to build a classifier based on content and social network information. However, social spammers are sophisticated and adaptable to game the system with fast evolving content and network patterns. First, social spammers continually change their spamming content patterns to avoid being detected. Second, reflexive reciprocity makes it easier for social spammers to establish social influence and pretend to be normal users by quickly accumulating a large number of "human" friends. It is challenging for existing anti-spamming systems based on batch-mode learning to quickly respond to newly emerging patterns for effective social spammer detection. In this paper, we present a general optimization framework to collectively use content and network information for social spammer detection, and provide the solution for efficient online processing. Experimental results on Twitter datasets confirm the effectiveness and efficiency of the proposed framework.

[1]  Fan Chung,et al.  Spectral Graph Theory , 1996 .

[2]  Huan Liu,et al.  Social Spammer Detection in Microblogging , 2013, IJCAI.

[3]  Krishna P. Gummadi,et al.  Understanding and combating link farming in the twitter social network , 2012, WWW.

[4]  Calton Pu,et al.  Social Honeypots: Making Friends With A Spammer Near You , 2008, CEAS.

[5]  Zenglin Xu,et al.  Online Learning for Group Lasso , 2010, ICML.

[6]  Volker Tresp,et al.  Multi-label informed latent semantic indexing , 2005, SIGIR '05.

[7]  F. Chung Laplacians and the Cheeger Inequality for Directed Graphs , 2005 .

[8]  Dawn Xiaodong Song,et al.  Suspended accounts in retrospect: an analysis of twitter spam , 2011, IMC '11.

[9]  Serhat Selcuk Bucak,et al.  Incremental subspace learning via non-negative matrix factorization , 2009, Pattern Recognit..

[10]  Thomas Fang Zheng,et al.  Online Non-Negative Convolutive Pattern Learning for Speech Signals , 2013, IEEE Transactions on Signal Processing.

[11]  Kyumin Lee,et al.  Uncovering social spammers: social honeypots + machine learning , 2010, SIGIR.

[12]  Qi He,et al.  TwitterRank: finding topic-sensitive influential twitterers , 2010, WSDM '10.

[13]  Huan Liu,et al.  Unsupervised sentiment analysis with emotional signals , 2013, WWW.

[14]  Virgílio A. F. Almeida,et al.  Detecting Spammers on Twitter , 2010 .

[15]  Nagiza F. Samatova,et al.  Detecting and Tracking Community Dynamics in Evolutionary Networks , 2010, 2010 IEEE International Conference on Data Mining Workshops.

[16]  H. Sebastian Seung,et al.  Algorithms for Non-negative Matrix Factorization , 2000, NIPS.

[17]  Eric Yeh,et al.  Efficient Online Learning and Prediction of Users' Desktop Actions , 2009, IJCAI.

[18]  Long Jiang,et al.  User-level sentiment analysis incorporating social networks , 2011, KDD.

[19]  Chris H. Q. Ding,et al.  Collaborative Filtering: Weighted Nonnegative Matrix Factorization Incorporating User and Item Graphs , 2010, SDM.

[20]  Patrik O. Hoyer,et al.  Non-negative Matrix Factorization with Sparseness Constraints , 2004, J. Mach. Learn. Res..

[21]  Qiang Yang,et al.  Discovering Spammers in Social Networks , 2012, AAAI.

[22]  Kyumin Lee,et al.  Seven Months with the Devils: A Long-Term Study of Content Polluters on Twitter , 2011, ICWSM.

[23]  Guillermo Sapiro,et al.  Online dictionary learning for sparse coding , 2009, ICML '09.

[24]  Nagiza F. Samatova,et al.  Community-based anomaly detection in evolutionary networks , 2012, Journal of Intelligent Information Systems.

[25]  H. Sebastian Seung,et al.  Learning the parts of objects by non-negative matrix factorization , 1999, Nature.

[26]  Charles L. Lawson,et al.  Solving least squares problems , 1976, Classics in applied mathematics.

[27]  Nan Sun,et al.  Exploiting internal and external semantics for the clustering of short texts using world knowledge , 2009, CIKM.

[28]  Leyla Bilge,et al.  All your contacts are belong to us: automated identity theft attacks on social networks , 2009, WWW '09.

[29]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[30]  Ben Y. Zhao,et al.  Uncovering social network Sybils in the wild , 2011, ACM Trans. Knowl. Discov. Data.

[31]  Bernhard Schölkopf,et al.  Learning from labeled and unlabeled data on a directed graph , 2005, ICML.

[32]  Yun Chi,et al.  Splog detection using self-similarity analysis on blog temporal dynamics , 2007, AIRWeb '07.

[33]  Huan Liu,et al.  Exploiting homophily effect for trust prediction , 2013, WSDM.

[34]  Guillermo Sapiro,et al.  Online Learning for Matrix Factorization and Sparse Coding , 2009, J. Mach. Learn. Res..