Spamology: A Study of Spam Origins

The rise of spam in the last decade has been staggering, with the rate of spam exceeding that of legitimate email. While conjectures exist on how spammers gain access to email addresses to spam, most work in the area of spam containment has either focused on better spam filtering methodologies or on understanding the botnets commonly used to send spam. In this paper, we aim to understand the origins of spam. We post dedicated email addresses to record how and where spammers go to obtain email addresses. We find that posting an email address on public Web pages yields immediate and high-volume spam. Surprisingly, even simple email obfuscation approaches are still sufficient today to prevent spammers from harvesting emails. We also find that attempts to find open relays continue to be popular among spammers. The insights we gain on the use of Web crawlers used to harvest email addresses and the commonalities of techniques used by spammers open the door for radically different follow-up work on spam containment and even systematic enforcement of spam legislation at a large scale.

[1]  Daniel Gooch,et al.  Communications of the ACM , 2011, XRDS.

[2]  Gordon V. Cormack,et al.  Spam and the ongoing battle for the inbox , 2007, CACM.

[3]  Zulfikar Ramzan,et al.  Phishing Attacks: Analyzing Trends in 2006 , 2007, CEAS.

[4]  Marc Simon The CAN-SPAM Act of 2003: Is Congressional Regulation of Unsolicited Commercial E-Mail Constitutional? , 2004 .

[5]  Chris Kanich,et al.  Spamalytics: an empirical analysis of spam marketing conversion , 2009, CACM.

[6]  Stefan Savage,et al.  Spamscatter: Characterizing Internet Scam Hosting Infrastructure , 2007, USENIX Security Symposium.

[7]  Zhuoqing Morley Mao,et al.  Peeking into Spammer Behavior from a Unique Vantage Point , 2008, LEET.

[8]  Meng Weng Wong,et al.  Sender Policy Framework (SPF) for Authorizing Use of Domains in E-Mail, Version 1 , 2006, RFC.

[9]  Meng Weng Wong,et al.  Sender ID: Authenticating E-Mail , 2006, RFC.

[10]  Nick Feamster,et al.  Dynamics of Online Scam Hosting Infrastructure , 2009, PAM.

[11]  Minaxi Gupta,et al.  Phishing Infrastructure Fluxes All the Way , 2009, IEEE Security & Privacy.

[12]  Geoff Hulten,et al.  Spamming botnets: signatures and characteristics , 2008, SIGCOMM '08.

[13]  Yao Zhao,et al.  BotGraph: Large Scale Spamming Botnet Detection , 2009, NSDI.

[14]  Stefan Savage,et al.  An inquiry into the nature and causes of the wealth of internet miscreants , 2007, CCS '07.

[15]  Christian Damsgaard Jensen,et al.  Privacy Recovery with Disposable Email Addresses , 2003, IEEE Secur. Priv..

[16]  Min-Yen Kan,et al.  Fast webpage classification using URL features , 2005, CIKM '05.

[17]  Arthur M. Keller,et al.  Understanding How Spammers Steal Your E-Mail Address: An Analysis of the First Six Months of Data from Project Honey Pot , 2005, CEAS.

[18]  Hao Chen,et al.  Spam double-funnel: connecting web spammers with advertisers , 2007, WWW '07.

[19]  Eric Allman,et al.  DomainKeys Identified Mail (DKIM) Signatures , 2007, RFC.