A social-spam detection framework

Social networks such as Facebook, MySpace, and Twitter have become increasingly important for reaching millions of users. Consequently, spammers are increasing using such networks for propagating spam. Existing filtering techniques such as collaborative filters and behavioral analysis filters are able to significantly reduce spam, each social network needs to build its own independent spam filter and support a spam team to keep spam prevention techniques current. We propose a framework for spam detection which can be used across all social network sites. There are numerous benefits of the framework including: 1) new spam detected on one social network, can quickly be identified across social networks; 2) accuracy of spam detection will improve with a large amount of data from across social networks; 3) other techniques (such as blacklists and message shingling) can be integrated and centralized; 4) new social networks can plug into the system easily, preventing spam at an early stage. We provide an experimental study of real datasets from social networks to demonstrate the flexibility and feasibility of our framework.

[1]  Susan T. Dumais,et al.  A Bayesian Approach to Filtering Junk E-Mail , 1998, AAAI 1998.

[2]  Harris Drucker,et al.  Support vector machines for spam categorization , 1999, IEEE Trans. Neural Networks.

[3]  Y. Freund,et al.  Discussion of the Paper \additive Logistic Regression: a Statistical View of Boosting" By , 2000 .

[4]  Sriram Raghavan,et al.  WebBase: a repository of Web pages , 2000, Comput. Networks.

[5]  Lluís Màrquez i Villodre,et al.  Boosting Trees for Anti-Spam Email Filtering , 2001, ArXiv.

[6]  Hector Garcia-Molina,et al.  Combating Web Spam with TrustRank , 2004, VLDB.

[7]  Ellen M. Voorhees,et al.  TREC: Experiment and Evaluation in Information Retrieval (Digital Libraries and Electronic Publishing) , 2005 .

[8]  Calton Pu,et al.  Introducing the Webb Spam Corpus: Using Email Spam to Identify Web Spam Automatically , 2006, CEAS.

[9]  Calton Pu,et al.  Towards the integration of diverse spam filtering techniques , 2006, 2006 IEEE International Conference on Granular Computing.

[10]  Calton Pu,et al.  Observed Trends in Spam Construction Techniques: A Case Study of Spam Evolution , 2006, CEAS.

[11]  José Luis Vicedo González,et al.  TREC: Experiment and evaluation in information retrieval , 2007, J. Assoc. Inf. Sci. Technol..

[12]  Calton Pu,et al.  Characterizing Web Spam Using Content and HTTP Session Analysis , 2007, CEAS.

[13]  Calton Pu,et al.  Predicting web spam with HTTP session information , 2008, CIKM '08.

[14]  Ling Liu,et al.  Socialtrust: tamper-resilient trust establishment in online communities , 2008, JCDL '08.

[15]  James Caverlee,et al.  A Large-Scale Study of MySpace: Observations and Implications for Online Social Networks , 2021, ICWSM.

[16]  Calton Pu,et al.  Evolutionary study of phishing , 2008, 2008 eCrime Researchers Summit.

[17]  Jian Hu,et al.  Cross-domain Text Classification using Wikipedia , 2008, IEEE Intell. Informatics Bull..

[18]  Calton Pu,et al.  Social Honeypots: Making Friends With A Spammer Near You , 2008, CEAS.

[19]  Chris Kanich,et al.  On the Spam Campaign Trail , 2008, LEET.

[20]  Alexandros Asthenidis,et al.  Social Networks as an Attack Platform: Facebook Case Study , 2009, 2009 Eighth International Conference on Networks.

[21]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[22]  C. Pu,et al.  An Anti-spam Filter Combination Framework for Text-and-Image Emails through Incremental Learning , 2009 .

[23]  Calton Pu,et al.  Study of Static Classification of Social Spam Profiles in MySpace , 2010, ICWSM.

[24]  Calton Pu,et al.  Study of Trend-Stuffing on Twitter through Text Classification , 2010 .

[25]  Elisabeth Lex,et al.  Efficient Cross-Domain Classification of Weblogs , 2010 .

[26]  Virgílio A. F. Almeida,et al.  Detecting Spammers on Twitter , 2010 .

[27]  Erdong Chen,et al.  Facebook immune system , 2011, SNS '11.

[28]  Dawn Xiaodong Song,et al.  Design and Evaluation of a Real-Time URL Spam Filtering Service , 2011, 2011 IEEE Symposium on Security and Privacy.