Towards modeling legitimate and unsolicited email traffic using social network properties

Identifying unsolicited email based on their network-level behavior rather than their content have received huge interest. In this study, we investigate the social network properties of large-scale email networks generated from real email traffic to reveal the properties that are indicative of spam as opposed to the expected legitimate behavior. By analyzing the structural and temporal properties of the email networks we confirm that legitimate email traffic generates a small-world, scale-free network similar to other social networks. However, email traffic as a whole contains unsolicited email, thus the structure of email networks deviates from that of social networks. Our study points out the distinctive characteristics of spam traffic and reveals that the anomalies in the structural properties of email networks are due to the unsocial behavior of spam.

[1]  Dit-Yan Yeung,et al.  A learning approach to spam detection based on social networks , 2007 .

[2]  Nick Feamster,et al.  Understanding the network-level behavior of spammers , 2006, SIGCOMM 2006.

[3]  Christos Faloutsos,et al.  Graph evolution: Densification and shrinking diameters , 2006, TKDD.

[4]  Michalis Faloutsos,et al.  On power-law relationships of the Internet topology , 1999, SIGCOMM '99.

[5]  Duncan J. Watts,et al.  Collective dynamics of ‘small-world’ networks , 1998, Nature.

[6]  Gueorgi Kossinets,et al.  Empirical Analysis of an Evolving Social Network , 2006, Science.

[7]  Virgílio A. F. Almeida,et al.  Comparative Graph Theoretical Characterization of Networks of Spam , 2005, CEAS.

[8]  Krishna P. Gummadi,et al.  Measurement and analysis of online social networks , 2007, IMC '07.

[9]  Albert-László Barabási,et al.  Statistical mechanics of complex networks , 2001, ArXiv.

[10]  Andrei Z. Broder,et al.  Graph structure in the Web , 2000, Comput. Networks.

[11]  Christos Faloutsos,et al.  oddball: Spotting Anomalies in Weighted Graphs , 2010, PAKDD.

[12]  Ming-Syan Chen,et al.  Incremental SVM Model for Spam Detection on Dynamic Email Social Networks , 2009, 2009 International Conference on Computational Science and Engineering.

[13]  Tomas Olovsson,et al.  On collection of large-scale multi-purpose datasets on internet backbone links , 2011, BADGERS '11.

[14]  Mark E. J. Newman,et al.  Power-Law Distributions in Empirical Data , 2007, SIAM Rev..

[15]  Sougata Mukherjea,et al.  Analyzing the Structure and Evolution of Massive Telecom Graphs , 2008, IEEE Transactions on Knowledge and Data Engineering.

[16]  S. Bornholdt,et al.  Scale-free topology of e-mail networks. , 2002, Physical review. E, Statistical, nonlinear, and soft matter physics.

[17]  Chris Kanich,et al.  Spamalytics: an empirical analysis of spam marketing conversion , 2008, CCS.

[18]  W. Marsden I and J , 2012 .

[19]  P. Oscar Boykin,et al.  Leveraging social networks to fight spam , 2005, Computer.