论文信息 - A fuzzy similarity approach for automated spam filtering

A fuzzy similarity approach for automated spam filtering

E-mail spam has become an epidemic problem that can negatively affect the usability of electronic mail as a communication means. Besides wasting users' time and effort to scan and delete the massive amount of junk e-mails received; it consumes network bandwidth and storage space, slows down e-mail servers, and provides a medium to distribute harmful and/or offensive content. Several machine learning approaches have been applied to this problem. In this paper, we explore a new approach based on fuzzy similarity that can automatically classify e-mail messages as spam or legitimate. We study its performance for various conjunction and disjunction operators for several datasets. The results are promising as compared with a naive Bayesian classifier. Classification accuracy above 97% and low false positive rates are achieved in many test cases.

El-Sayed M. El-Alfy | Fares S. Al-Qunaieer

[1] William S. Yerazunis. Sparse Binary Polynomial Hashing and the CRM114 Discriminator , 2006 .

[2] Bogdan Hoanca,et al. How good are our weapons in the spam wars? , 2006, IEEE Technology and Society Magazine.

[3] Ray Hunt,et al. Current and New Developments in Spam Filtering , 2006, 2006 14th IEEE International Conference on Networks.

[4] Pat Langley,et al. An Analysis of Bayesian Classifiers , 1992, AAAI.

[5] Irena Koprinska,et al. A neural network based approach to automated e-mail classification , 2003, Proceedings IEEE/WIC International Conference on Web Intelligence (WI 2003).

[6] Hinrich Schütze,et al. Introduction to information retrieval , 2008 .

[7] Masaaki Tanaka,et al. Bayesian Spam Filterを用いた要約の自動分類の試み , 2006 .

[8] Debzani Deb,et al. A Trainable Fuzzy Spam Detection System , 2004 .

[9] Mikko T. Siponen,et al. Effective Anti-Spam Strategies in Companies: An International Study , 2006, Proceedings of the 39th Annual Hawaii International Conference on System Sciences (HICSS'06).

[10] Georgios Paliouras,et al. Learning to Filter Spam E-Mail: A Comparison of a Naive Bayesian and a Memory-Based Approach , 2000, ArXiv.

[11] Ridvan Saraçoglu,et al. A fuzzy clustering approach for finding similar documents using a novel similarity measure , 2007, Expert Syst. Appl..

[12] Thorsten Joachims,et al. Text Categorization with Support Vector Machines: Learning with Many Relevant Features , 1998, ECML.

[13] Constantine D. Spyropoulos,et al. An experimental comparison of naive Bayesian and keyword-based anti-spam filtering with personal e-mail messages , 2000, SIGIR '00.

[14] William W. Cohen. Learning Rules that Classify E-Mail , 1996 .

[15] Harris Drucker,et al. Support vector machines for spam categorization , 1999, IEEE Trans. Neural Networks.

[16] Martin F. Porter,et al. An algorithm for suffix stripping , 1997, Program.

[17] Christopher D. Manning,et al. Introduction to Information Retrieval , 2010, J. Assoc. Inf. Sci. Technol..

[18] Georgios Paliouras,et al. A Memory-Based Approach to Anti-Spam Filtering for Mailing Lists , 2004, Information Retrieval.

[19] KarkaletsisVangelis,et al. A Memory-Based Approach to Anti-Spam Filtering for Mailing Lists , 2003 .

[20] G. Manning. The use of the DAP, a massively parallel computing system, for information retrieval and processing , 1989 .

[21] Susan T. Dumais,et al. A Bayesian Approach to Filtering Junk E-Mail , 1998, AAAI 1998.

[22] John Yen,et al. A fuzzy similarity approach in text classification task , 2000, Ninth IEEE International Conference on Fuzzy Systems. FUZZ- IEEE 2000 (Cat. No.00CH37063).

[23] Dave C. Trudgian. Spam Classification Using Nearest Neighbour Techniques , 2004, IDEAL.