An Evidential Spam-Filtering Framework

ABSTRACT Spam, also known as unsolicited bulk e-mail (UBE), has recently become a serious threat that negatively impacts the usability of legitimate mails. In this article, an evidential spam-filtering framework is proposed. As a useful tool to handle uncertainty, the Dempster–Shafer theory of evidence (D–S theory) is integrated into the proposed approach. Five representative features from an e-mail header are analyzed. With a machine-learning algorithm, e-mail headers with known classifications are used to train the framework. When using the framework for a given e-mail header, its representative features are quantified. Although in classical probability theory, possibilities are forcedly assigned even when information is not adequate, in our approach, for every word in an e-mail subject, basic probability assignments (BPA) are assigned in a more flexible way, thus providing a more reasonable result. Finally, BPAs are combined and transformed into pignistic probabilities for decision-making. Empirical trials on real-world datasets show the efficiency of the proposed framework.

[1]  Chalee Vorakulpipat,et al.  Polite sender: A resource-saving spam email countermeasure based on sender responsibilities and recipient justifications , 2012, Comput. Secur..

[2]  Guillermo González-Talaván,et al.  A simple, configurable SMTP anti-spam filter: Greylists , 2006, Comput. Secur..

[3]  Georgios Paliouras,et al.  An evaluation of Naive Bayesian anti-spam filtering , 2000, ArXiv.

[4]  Sankaran Mahadevan,et al.  Evidential cognitive maps , 2012, Knowl. Based Syst..

[5]  Santosh S. Vempala,et al.  Filtering spam with behavioral blacklisting , 2007, CCS '07.

[6]  Ray Hunt,et al.  Tightening the net: A review of current and next generation spam filtering tools , 2006, Comput. Secur..

[7]  Eduardo Conde,et al.  An HMM for detecting spam mail , 2007, Expert Syst. Appl..

[8]  Wolfgang Nejdl,et al.  MailRank: using ranking for spam detection , 2005, CIKM '05.

[9]  Chih-Chien Wang,et al.  Using header session messages to anti-spamming , 2007, Comput. Secur..

[10]  Yong Deng,et al.  FUZZY SENSOR FUSION BASED ON EVIDENCE THEORY AND ITS APPLICATION , 2013, Appl. Artif. Intell..

[11]  Amir Herzberg,et al.  DNS-based email sender authentication mechanisms: A critical review , 2009, Comput. Secur..

[12]  Glenn Shafer,et al.  A Mathematical Theory of Evidence , 2020, A Mathematical Theory of Evidence.

[13]  Yong Deng,et al.  An improved method for risk evaluation in failure modes and effects analysis of aircraft engine rotor blades , 2012 .

[14]  Jian-Bo Yang,et al.  New model for system behavior prediction based on belief rule based systems , 2010, Inf. Sci..

[15]  Pedro M. Domingos,et al.  On the Optimality of the Simple Bayesian Classifier under Zero-One Loss , 1997, Machine Learning.

[16]  Joshua Alspector,et al.  The Impact of Feature Selection on Signature-Driven Spam Detection , 2004, CEAS.

[17]  José María Gómez Hidalgo,et al.  Combining Text and Heuristics for Cost-Sensitive Spam Filtering , 2000, CoNLL/LLL.

[18]  Philippe Smets,et al.  The Transferable Belief Model , 1991, Artif. Intell..

[19]  Takamichi Saito Anti-SPAM System: Another Way of Preventing SPAM , 2005, 16th International Workshop on Database and Expert Systems Applications (DEXA'05).

[20]  John C. Klensin,et al.  Simple Mail Transfer Protocol , 2001, RFC.

[21]  Xinyang Deng,et al.  Assessment of E-Commerce security using AHP and evidential reasoning , 2012, Expert Syst. Appl..

[22]  Constantine D. Spyropoulos,et al.  An experimental comparison of naive Bayesian and keyword-based anti-spam filtering with personal e-mail messages , 2000, SIGIR '00.

[23]  Dong-Ling Xu,et al.  A belief rule-based decision support system for clinical risk assessment of cardiac chest pain , 2012, Eur. J. Oper. Res..

[24]  Yong Hu,et al.  TOPPER: Topology Prediction of Transmembrane Protein Based on Evidential Reasoning , 2013, TheScientificWorldJournal.

[25]  Vern Paxson,et al.  Outside the Closed World: On Using Machine Learning for Network Intrusion Detection , 2010, 2010 IEEE Symposium on Security and Privacy.

[26]  Harris Drucker,et al.  Support vector machines for spam categorization , 1999, IEEE Trans. Neural Networks.

[27]  Aiko Pras,et al.  Finding and Analyzing Evil Cities on the Internet , 2011, AIMS.

[28]  Arthur P. Dempster,et al.  Upper and Lower Probabilities Induced by a Multivalued Mapping , 1967, Classic Works of the Dempster-Shafer Theory of Belief Functions.

[29]  Pat Langley,et al.  An Analysis of Bayesian Classifiers , 1992, AAAI.

[30]  Emil Sit,et al.  An empirical study of spam traffic and the use of DNS black lists , 2004, IMC '04.

[31]  Eve Edelson The 419 scam: information warfare on the spam front and a proposal for local filtering , 2003, Comput. Secur..

[32]  Henry Stern,et al.  A Survey of Modern Spam Tools , 2008, CEAS.

[33]  Jian-Bo Yang,et al.  On the evidential reasoning algorithm for multiple attribute decision analysis under uncertainty , 2002, IEEE Trans. Syst. Man Cybern. Part A.

[34]  Anirban Mondal,et al.  On Effective E-mail Classification via Neural Networks , 2005, DEXA.

[35]  Jianfeng Du,et al.  An integrative framework for intelligent software project risk planning , 2013, Decis. Support Syst..

[36]  Jun Ho Huh,et al.  Hybrid spam filtering for mobile communication , 2009, Comput. Secur..

[37]  Walmir M. Caminhas,et al.  A review of machine learning approaches to Spam filtering , 2009, Expert Syst. Appl..

[38]  Yong Hu,et al.  Software project risk analysis using Bayesian networks with causality constraints , 2013, Decis. Support Syst..

[39]  Yong Hu,et al.  A scalable intelligent non-content-based spam-filtering framework , 2010, Expert Syst. Appl..

[40]  Dennis McLeod,et al.  A Comparative Study for Email Classification , 2007 .

[41]  Babak Nadjar Araabi,et al.  Learning by abstraction: Hierarchical classification model using evidential theoretic approach and Bayesian ensemble model , 2014, Neurocomputing.

[42]  S. Mahadevan,et al.  Identifying influential nodes in weighted networks based on evidence theory , 2013 .