A Supervised Approach for Spam Detection Using Text-Based Semantic Representation

In this paper, we propose an approach for email spam detection based on text semantic analysis at two levels. The first level allows categorization of emails by specific domains (e.g., health, education, finance, etc.). The second level uses semantic features for spam detection in each specific domain. We show that the proposed method provides an efficient representation of internal semantic structure of email content which allows for more precise and interpretable spam filtering results compared to existing methods.

[1]  Gonzalo Álvarez,et al.  Word sense disambiguation for spam filtering , 2012, Electron. Commer. Res. Appl..

[2]  Gordon V. Cormack,et al.  Email Spam Filtering: A Systematic Review , 2008, Found. Trends Inf. Retr..

[3]  Igor Santos,et al.  Enhanced Topic-based Vector Space Model for semantics-aware spam filtering , 2012, Expert Syst. Appl..

[4]  Tunga Güngör,et al.  Time-efficient spam e-mail filtering using n-gram models , 2008, Pattern Recognit. Lett..

[5]  Gang Zheng,et al.  The Improved Bayesian Algorithm to Spam Filtering , 2015 .

[6]  Peter Clark,et al.  The CN2 Induction Algorithm , 1989, Machine Learning.

[7]  D. Karthika Renuka,et al.  SPAM Classification Based on Supervised Learning Using Machine Learning Techniques , 2011 .

[8]  Maozhen Li,et al.  A survey of emerging approaches to spam filtering , 2012, CSUR.

[9]  D. Karthika Renuka,et al.  Spam Classification Based on Supervised Learning Using Machine Learning Techniques , 2011, 2011 International Conference on Process Automation, Control and Computing.

[10]  Johannes Fürnkranz,et al.  Foundations of Rule Learning , 2012, Cognitive Technologies.

[11]  K. Bowyer,et al.  Combining Decision Trees Learned in Parallel , 1998 .

[12]  Walmir M. Caminhas,et al.  A review of machine learning approaches to Spam filtering , 2009, Expert Syst. Appl..

[13]  Peter A. Flach,et al.  Subgroup Discovery with CN2-SD , 2004, J. Mach. Learn. Res..

[14]  María José del Jesús,et al.  An overview on subgroup discovery: foundations and applications , 2011, Knowledge and Information Systems.

[15]  Jian Pei,et al.  Email mining: tasks, common techniques, and tools , 2013, Knowledge and Information Systems.

[16]  Blaz Zupan,et al.  Spam Filtering Using Statistical Data Compression Models , 2006, J. Mach. Learn. Res..

[17]  Peter Clark,et al.  Rule Induction with CN2: Some Recent Improvements , 1991, EWSL.