Filtering spam with behavioral blacklisting

Spam filters often use the reputation of an IP address (or IP address range) to classify email senders. This approach worked well when most spam originated from senders with fixed IP addresses, but spam today is also sent from IP addresses for which blacklist maintainers have outdated or inaccurate information (or no information at all). Spam campaigns also involve many senders, reducing the amount of spam any particular IP address sends to a single domain; this method allows spammers to stay "under the radar". The dynamism of any particular IP address begs for blacklisting techniques that automatically adapt as the senders of spam change. This paper presents SpamTracker, a spam filtering system that uses a new technique called behavioral blacklisting to classify email senders based on their sending behavior rather than their identity. Spammers cannot evade SpamTracker merely by using "fresh" IP addresses because blacklisting decisions are based on sending patterns, which tend to remain more invariant. SpamTracker uses fast clustering algorithms that react quickly to changes in sending patterns. We evaluate SpamTracker's ability to classify spammers using email logs for over 115 email domains; we find that SpamTracker can correctly classify many spammers missed by current filtering techniques. Although our current datasets prevent us from confirming SpamTracker's ability to completely distinguish spammers from legitimate senders, our evaluation shows that SpamTracker can identify a significant fraction of spammers that current IP-based blacklists miss. SpamTracker's ability to identify spammers before existing blacklists suggests that it can be used in conjunction with existing techniques (e.g., as an input to greylisting). SpamTracker is inherently distributed and can be easily replicated; incorporating it into existing email filtering infrastructures requires only small modifications to mail server configurations.

[1]  Mark Jerrum,et al.  Approximate Counting, Uniform Generation and Rapidly Mixing Markov Chains , 1987, WG.

[2]  Moni Naor,et al.  Pricing via Processing or Combatting Junk Mail , 1992, CRYPTO.

[3]  Santosh S. Vempala,et al.  On clusterings-good, bad and spectral , 2000, Proceedings 41st Annual Symposium on Foundations of Computer Science.

[4]  Martín Abadi,et al.  Bankable Postage for Network Services , 2003, ASIAN.

[5]  Ernesto Damiani,et al.  P2P-based collaborative spam detection and filtering , 2004, Proceedings. Fourth International Conference on Peer-to-Peer Computing, 2004. Proceedings..

[6]  Ben Laurie,et al.  \Proof-of-Work" Proves Not to Work , 2004 .

[7]  Richard Clayton,et al.  Stopping Spam by Extrusion Detection , 2004, CEAS.

[8]  Richard Clayton,et al.  Stopping Outgoing Spam by Examining Incoming Server Logs , 2005, CEAS.

[9]  Khadzir Nor Hafizah Spam and open relay blocking system , 2005 .

[10]  Santosh S. Vempala,et al.  A divide-and-merge methodology for clustering , 2005, PODS '05.

[11]  Meng Weng Wong,et al.  Sender Policy Framework (SPF) for Authorizing Use of Domains in E-Mail, Version 1 , 2006, RFC.

[12]  David Mazières,et al.  RE: Reliable Email , 2006, NSDI.

[13]  Michael Walfish,et al.  Distributed Quota Enforcement for Spam Control , 2006, NSDI.

[14]  Fulu Li,et al.  An Empirical Study of Clustering Behavior of Spammers and Group-based Anti-Spam Strategies , 2006, CEAS.

[15]  Nick Feamster,et al.  Can DNS-Based Blacklists Keep Up with Bots? , 2006, CEAS.

[16]  Nick Feamster,et al.  Understanding the network-level behavior of spammers , 2006, SIGCOMM.

[17]  Alex Brodsky,et al.  A Distributed Content Independent Method for Spam Detection , 2007, HotBots.

[18]  Stefan Savage,et al.  Spamscatter: Characterizing Internet Scam Hosting Infrastructure , 2007, USENIX Security Symposium.

[19]  Xin Yuan,et al.  Behavioral Characteristics of Spammers and Their Network Reachability Properties , 2007, 2007 IEEE International Conference on Communications.