Social Spammer Detection with Sentiment Information

Social media is a popular platform for spammers to unfairly overwhelm normal users with unwanted or fake content via social networking. The spammers significantly hinder the use of social media systems for effective information dissemination and sharing. Different from the spammers in traditional platforms such as email and the Web, spammers in social media can easily connect with each other, sometimes without mutual consent. They collude with each other to imitate normal users by quickly accumulating a large number of "human" friends. In addition, content information in social media is noisy and unstructured. It is infeasible to directly apply traditional spammer detection methods in social media. Understanding and detecting deception has been extensively studied in traditional sociology and social sciences. Motivated by psychological findings in physical world, we investigate whether sentiment analysis can help spammer detection in online social media. In particular, we first conduct an exploratory study to analyze the sentiment differences between spammers and normal users, and then present an optimization formulation that incorporates sentiment information into a novel social spammer detection framework. Experimental results on real-world social media datasets show the superior performance of the proposed framework by harnessing sentiment analysis for social spammer detection.

[1]  Johan Bollen,et al.  Twitter mood predicts the stock market , 2010, J. Comput. Sci..

[2]  L. Fleischer Telling Lies Clues To Deceit In The Marketplace Politics And Marriage , 2016 .

[3]  Huan Liu,et al.  Exploiting social relations for sentiment analysis in microblogging , 2013, WSDM.

[4]  Volker Tresp,et al.  Multi-label informed latent semantic indexing , 2005, SIGIR '05.

[5]  H. Sebastian Seung,et al.  Learning the parts of objects by non-negative matrix factorization , 1999, Nature.

[6]  Ben Y. Zhao,et al.  Uncovering social network sybils in the wild , 2011, IMC '11.

[7]  Vern Paxson,et al.  @spam: the underground on 140 characters or less , 2010, CCS '10.

[8]  Jacob Ratkiewicz,et al.  Detecting and Tracking Political Abuse in Social Media , 2011, ICWSM.

[9]  P. Ekman,et al.  Unmasking the face : a guide to recognizing emotions from facial clues , 1975 .

[10]  Nan Sun,et al.  Exploiting internal and external semantics for the clustering of short texts using world knowledge , 2009, CIKM.

[11]  Huan Liu,et al.  Exploiting homophily effect for trust prediction , 2013, WSDM.

[12]  Hosung Park,et al.  What is Twitter, a social network or a news media? , 2010, WWW '10.

[13]  Jacob Ratkiewicz,et al.  Detecting and Tracking the Spread of Astroturf Memes in Microblog Streams , 2010, ArXiv.

[14]  Kyumin Lee,et al.  Seven Months with the Devils: A Long-Term Study of Content Polluters on Twitter , 2011, ICWSM.

[15]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[16]  Eric R. Ziegel,et al.  The Elements of Statistical Learning , 2003, Technometrics.

[17]  Brendan T. O'Connor,et al.  From Tweets to Polls: Linking Text Sentiment to Public Opinion Time Series , 2010, ICWSM.

[18]  Yue Lu,et al.  Automatic construction of a context-aware sentiment lexicon: an optimization approach , 2011, WWW.

[19]  Huan Liu,et al.  Online Social Spammer Detection , 2014, AAAI.

[20]  Huan Liu,et al.  Relational learning via latent social dimensions , 2009, KDD.

[21]  Huan Liu,et al.  Leveraging knowledge across media for spammer detection in microblogging , 2014, SIGIR.

[22]  Huan Liu,et al.  Unsupervised sentiment analysis with emotional signals , 2013, WWW.

[23]  Huan Liu,et al.  Social Spammer Detection in Microblogging , 2013, IJCAI.

[24]  Krishna P. Gummadi,et al.  Understanding and combating link farming in the twitter social network , 2012, WWW.

[25]  Johan Bollen,et al.  Modeling Public Mood and Emotion: Twitter Sentiment and Socio-Economic Phenomena , 2009, ICWSM.

[26]  Chris H. Q. Ding,et al.  Collaborative Filtering: Weighted Nonnegative Matrix Factorization Incorporating User and Item Graphs , 2010, SDM.

[27]  Qiang Yang,et al.  Discovering Spammers in Social Networks , 2012, AAAI.

[28]  F. Chung Laplacians and the Cheeger Inequality for Directed Graphs , 2005 .

[29]  H. Sebastian Seung,et al.  Algorithms for Non-negative Matrix Factorization , 2000, NIPS.

[30]  Kyumin Lee,et al.  Uncovering social spammers: social honeypots + machine learning , 2010, SIGIR.

[31]  Trevor Hastie,et al.  The Elements of Statistical Learning , 2001 .

[32]  Bing Liu,et al.  Opinion Mining and Sentiment Analysis , 2011 .

[33]  U. Feige,et al.  Spectral Graph Theory , 2015 .

[34]  Bernhard Schölkopf,et al.  Learning from labeled and unlabeled data on a directed graph , 2005, ICML.

[35]  Bing Liu,et al.  Opinion spam and analysis , 2008, WSDM '08.

[36]  P. Oscar Boykin,et al.  Leveraging social networks to fight spam , 2005, Computer.

[37]  Jieping Ye,et al.  Multi-Task Feature Learning Via Efficient l2, 1-Norm Minimization , 2009, UAI.

[38]  Leyla Bilge,et al.  All your contacts are belong to us: automated identity theft attacks on social networks , 2009, WWW '09.

[39]  Bo Pang,et al.  Thumbs up? Sentiment Classification using Machine Learning Techniques , 2002, EMNLP.

[40]  Enrico Blanzieri,et al.  A survey of learning-based techniques of email spam filtering , 2008, Artificial Intelligence Review.

[41]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[42]  Calton Pu,et al.  Introducing the Webb Spam Corpus: Using Email Spam to Identify Web Spam Automatically , 2006, CEAS.

[43]  E. A. Haggard,et al.  Micromomentary facial expressions as indicators of ego mechanisms in psychotherapy , 1966 .

[44]  Calton Pu,et al.  Social Honeypots: Making Friends With A Spammer Near You , 2008, CEAS.

[45]  Konstantin Beznosov,et al.  The socialbot network: when bots socialize for fame and money , 2011, ACSAC '11.

[46]  Charles L. Lawson,et al.  Solving least squares problems , 1976, Classics in applied mathematics.

[47]  Qi He,et al.  TwitterRank: finding topic-sensitive influential twitterers , 2010, WSDM '10.

[48]  Jacob Ratkiewicz,et al.  Truthy: mapping the spread of astroturf in microblog streams , 2010, WWW.