Detecting Spam URLs in Social Media via Behavioral Analysis

This paper addresses the challenge of detecting spam URLs in social media, which is an important task for shielding users from links associated with phishing, malware, and other low-quality, suspicious content. Rather than rely on traditional blacklist-based filters or content analysis of the landing page for Web URLs, we examine the behavioral factors of both who is posting the URL and who is clicking on the URL. The core intuition is that these behavioral signals may be more difficult to manipulate than traditional signals. Concretely, we propose and evaluate fifteen click and posting-based features. Through extensive experimental evaluation, we find that this purely behavioral approach can achieve high precision (0.86), recall (0.86), and area-under-the-curve (0.92), suggesting the potential for robust behavior-based spam detection.

[1]  Christopher Krügel,et al.  Detection and analysis of drive-by-download attacks and malicious JavaScript code , 2010, WWW '10.

[2]  Jong Kim,et al.  Spam Filtering in Twitter Using Sender-Receiver Relationship , 2011, RAID.

[3]  Fabrício Benevenuto,et al.  Phi.sh/$oCiaL: the phishing landscape through short URLs , 2011, CEAS '11.

[4]  Virgílio A. F. Almeida,et al.  Detecting Spammers on Twitter , 2010 .

[5]  Gianluca Stringhini,et al.  Detecting spammers on social networks , 2010, ACSAC '10.

[6]  Kyumin Lee,et al.  Uncovering social spammers: social honeypots + machine learning , 2010, SIGIR.

[7]  Lawrence K. Saul,et al.  Beyond blacklists: learning to detect malicious web sites from suspicious URLs , 2009, KDD.

[8]  Krishna P. Gummadi,et al.  On word-of-mouth based discovery of the web , 2011, IMC '11.

[9]  Gang Wang,et al.  Northeastern University , 2021, IEEE Pulse.

[10]  Fabrizio Silvestri,et al.  Know your neighbors: web spam detection using the web topology , 2007, SIGIR.

[11]  Vern Paxson,et al.  @spam: the underground on 140 characters or less , 2010, CCS '10.

[12]  Dawn Xiaodong Song,et al.  Suspended accounts in retrospect: an analysis of twitter spam , 2011, IMC '11.

[13]  Xuxian Jiang,et al.  Automated Web Patrol with Strider HoneyMonkeys: Finding Web Sites That Exploit Browser Vulnerabilities , 2006, NDSS.

[14]  Chao Yang,et al.  Empirical Evaluation and New Design for Fighting Evolving Twitter Spammers , 2011, IEEE Transactions on Information Forensics and Security.

[15]  Yiqun Liu,et al.  Fighting against web spam: a novel propagation method based on click-through data , 2012, SIGIR '12.

[16]  Gianluca Stringhini,et al.  Two years of short URLs internet measurement: security threats and countermeasures , 2013, WWW.

[17]  Dawn Xiaodong Song,et al.  Design and Evaluation of a Real-Time URL Spam Filtering Service , 2011, 2011 IEEE Symposium on Security and Privacy.

[18]  Gang Wang,et al.  Serf and turf: crowdturfing for fun and profit , 2011, WWW.

[19]  Markus Strohmaier,et al.  Short links under attack: geographical analysis of spam in a URL shortener network , 2012, HT '12.

[20]  Lawrence K. Saul,et al.  Identifying suspicious URLs: an application of large-scale online learning , 2009, ICML '09.

[21]  Giovanni Vigna,et al.  Prophiler: a fast filter for the large-scale detection of malicious web pages , 2011, WWW.

[22]  Minaxi Gupta,et al.  Behind Phishing: An Examination of Phisher Modi Operandi , 2008, LEET.

[23]  Brian Ryner,et al.  Large-Scale Automatic Classification of Phishing Pages , 2010, NDSS.

[24]  Yiqun Liu,et al.  Are the URLs really popular in microblog messages? , 2011, 2011 IEEE International Conference on Cloud Computing and Intelligence Systems.

[25]  A. Neumann,et al.  Security and Privacy Implications of URL Shortening Services , 2010 .

[26]  Jong Kim,et al.  WarningBird: Detecting Suspicious URLs in Twitter Stream , 2012, NDSS.

[27]  Sotiris Ioannidis,et al.  we.b: the web of short urls , 2011, WWW.