Adaptive e-mails intention finding system based on words social networks

Although many anti-spam techniques have been proposed till date, a foolproof solution for overcoming spam has not been found yet. Spammers still spread spam by using invariant intentions such as advertising and phishing; these intentions are difficult to detect using signature-based or content-based spam filters. In this study, we have proposed an adaptive e-mail intention finding system based on the E-mail Word Social Network (EWSN) that can detect the e-mails' intention and can adaptively and continually learn. EWSN is a data structure used for profiling a user's intentions through solicited and unsolicited e-mails. The EWSNs are constructed on the basis of the information in the user's mailbox and the expanded social relations of words obtained via search engines on the World Wide Web. Unlike previous approaches of spam filters, our system only requires a small amount of training data and it can be trained through feedback incrementally. Experimental quantitative results demonstrate that the misclassification rate, precision rate, and recall rate are better than several content-based filtering methods using a limited amount of training data. The quantitative results also demonstrate that the proposed method has good detection ability in the case of novel spam e-mail detection, without constantly updating the pattern of novel spam e-mails. The proposed method - capable of intention profiling and continual adaptation - is robust for detecting spam e-mails.

[1]  Tom Fawcett,et al.  "In vivo" spam filtering: a challenge problem for KDD , 2003, SKDD.

[2]  Dardo Tomasi,et al.  Optimization of Biplanar Gradient Coils for Magnetic Resonance Imaging , 2006 .

[3]  Luciano Rossoni,et al.  Models and methods in social network analysis , 2006 .

[4]  Jonathan Timmis,et al.  Artificial immune systems - a new computational intelligence paradigm , 2002 .

[5]  Susan T. Dumais,et al.  A Bayesian Approach to Filtering Junk E-Mail , 1998, AAAI 1998.

[6]  F. Azuaje Artificial Immune Systems: A New Computational Intelligence Approach , 2003 .

[7]  P. Oscar Boykin,et al.  Leveraging social networks to fight spam , 2005, Computer.

[8]  Harris Drucker,et al.  Support vector machines for spam categorization , 1999, IEEE Trans. Neural Networks.

[9]  S. Bornholdt,et al.  Scale-free topology of e-mail networks. , 2002, Physical review. E, Statistical, nonlinear, and soft matter physics.

[10]  Andrew McCallum,et al.  Disambiguating Web appearances of people in a social network , 2005, WWW '05.

[11]  Adam J. O'Donnell,et al.  Using E-Mail Social Network Analysis for Detecting Unauthorized Accounts , 2006, CEAS.

[12]  Tom Fawcett "In vivo" spam filtering: A challenge problem for data mining , 2004, ArXiv.

[13]  Adam J. O'Donnell The Evolutionary Microcosm of Stock Spam , 2007, IEEE Security & Privacy.

[14]  Gordon V. Cormack,et al.  Spam Corpus Creation for TREC , 2005, CEAS.

[15]  Hinrich Schütze,et al.  Introduction to information retrieval , 2008 .

[16]  Stanley Wasserman,et al.  Social Network Analysis: Methods and Applications , 1994, Structural analysis in the social sciences.

[17]  Joshua Goodman,et al.  Online Discriminative Spam Filter Training , 2006, CEAS.

[18]  Partha Dasgupta,et al.  Topology of the conceptual network of language. , 2002, Physical review. E, Statistical, nonlinear, and soft matter physics.

[19]  Virgílio A. F. Almeida,et al.  Workload models of spam and legitimate e-mails , 2007, Perform. Evaluation.

[20]  Gordon V. Cormack,et al.  TREC 2006 Spam Track Overview , 2006, TREC.

[21]  Alex Alves Freitas,et al.  AISEC: an artificial immune system for e-mail classification , 2003, IEEE Congress on Evolutionary Computation.

[22]  Katsuyuki Yamazaki,et al.  Density-based spam detector , 2004, IEICE Trans. Inf. Syst..

[23]  Shabbir Ahmed,et al.  Word Stemming to Enhance Spam Filtering , 2004, CEAS.

[24]  G. Corso,et al.  A Scale-Free Network of Evoked Words , 2006 .