Spamscatter: Characterizing Internet Scam Hosting Infrastructure

Unsolicited bulk e-mail, or SPAM, is a means to an end. For virtually all such messages, the intent is to attract the recipient into entering a commercial transaction -- typically via a linked Web site. While the prodigious infrastructure used to pump out billions of such solicitations is essential, the engine driving this process is ultimately the "point-of-sale" -- the various money-making "scams" that extract value from Internet users. In the hopes of better understanding the business pressures exerted on spammers, this paper focuses squarely on the Internet infrastructure used to host and support such scams. We describe an opportunistic measurement technique called spamscatter that mines emails in real-time, follows the embedded link structure, and automatically clusters the destination Web sites using image shingling to capture graphical similarity between rendered sites. We have implemented this approach on a large real-time spam feed (over 1M messages per week) and have identified and analyzed over 2,000 distinct scams on 7,000 distinct servers.

[1]  Andrei Z. Broder,et al.  On the resemblance and containment of documents , 1997, Proceedings. Compression and Complexity of SEQUENCES 1997 (Cat. No.97TB100171).

[2]  Hector Garcia-Molina,et al.  Finding Near-Replicas of Documents and Servers on the Web , 1998, WebDB.

[3]  Ramarathnam Venkatesan,et al.  Robust image hashing , 2000, Proceedings 2000 International Conference on Image Processing (Cat. No.00CH37101).

[4]  Azer Bestavros,et al.  On the marginal utility of network topology measurements , 2001, IMW '01.

[5]  David Moore,et al.  Code-Red: a case study on the spread and victims of an internet worm , 2002, IMW '02.

[6]  Marc Najork,et al.  On the evolution of clusters of near-duplicate Web pages , 2003, Proceedings of the IEEE/LEOS 3rd International Conference on Numerical Simulation of Semiconductor Optoelectronic Devices (IEEE Cat. No.03EX726).

[7]  Stefan Savage,et al.  Inside the Slammer Worm , 2003, IEEE Secur. Priv..

[8]  Zhuoqing Morley Mao,et al.  Toward understanding distributed blackhole placement , 2004, WORM '04.

[9]  John C. Mitchell,et al.  Client-Side Defense Against Web-Based Identity Theft , 2004, NDSS.

[10]  Vinod Yegneswaran,et al.  On the Design and Use of Internet Sinks for Network Abuse Monitoring , 2004, RAID.

[11]  Vishal Monga,et al.  Robust perceptual image hashing using feature points , 2004, 2004 International Conference on Image Processing, 2004. ICIP '04..

[12]  Christophe De Vleeschouwer,et al.  Robust image hashing based on radial variance of pixels , 2005, IEEE International Conference on Image Processing 2005.

[13]  Tal Garfinkel,et al.  Opportunistic Measurement: Extracting Insight from Spurious Traffic , 2005 .

[14]  Farnam Jahanian,et al.  The Zombie Roundup: Understanding, Detecting, and Disrupting Botnets , 2005, SRUTI.

[15]  Stefan Savage,et al.  Inferring Internet denial-of-service activity , 2001, TOCS.

[16]  Rainer Böhme,et al.  The Effect of Stock Spam on Financial Markets , 2006, WEIS.

[17]  Calton Pu,et al.  Introducing the Webb Spam Corpus: Using Email Spam to Identify Web Spam Automatically , 2006, CEAS.

[18]  Nick Feamster,et al.  Understanding the network-level behavior of spammers , 2006, SIGCOMM.

[19]  Andreas Terzis,et al.  A multifaceted approach to understanding the botnet phenomenon , 2006, IMC '06.