SPONGY (SPam ONtoloGY): Email Classification Using Two-Level Dynamic Ontology

Email is one of common communication methods between people on the Internet. However, the increase of email misuse/abuse has resulted in an increasing volume of spam emails over recent years. An experimental system has been designed and implemented with the hypothesis that this method would outperform existing techniques, and the experimental results showed that indeed the proposed ontology-based approach improves spam filtering accuracy significantly. In this paper, two levels of ontology spam filters were implemented: a first level global ontology filter and a second level user-customized ontology filter. The use of the global ontology filter showed about 91% of spam filtered, which is comparable with other methods. The user-customized ontology filter was created based on the specific user's background as well as the filtering mechanism used in the global ontology filter creation. The main contributions of the paper are (1) to introduce an ontology-based multilevel filtering technique that uses both a global ontology and an individual filter for each user to increase spam filtering accuracy and (2) to create a spam filter in the form of ontology, which is user-customized, scalable, and modularized, so that it can be embedded to many other systems for better performance.

[1]  Yiming Yang,et al.  A Comparative Study on Feature Selection in Text Categorization , 1997, ICML.

[2]  Georgios Paliouras,et al.  Learning to Filter Unsolicited Commercial E-Mail , 2006 .

[3]  Raphael Volz,et al.  Semi-automatic Ontology Acquisition from a Corporate Intranet , 2000 .

[4]  George Forman,et al.  An Extensive Empirical Study of Feature Selection Metrics for Text Classification , 2003, J. Mach. Learn. Res..

[5]  Sharma Chakravarthy,et al.  eMailSift: eMail classification based on structure and content , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[6]  Anirban Mondal,et al.  On Effective E-mail Classification via Neural Networks , 2005, DEXA.

[7]  Dennis McLeod,et al.  Ontology Development Tools for Ontology- Based Knowledge Management , 2006 .

[8]  Jihoon Yang,et al.  Intelligent Email Categorization Based on Textual Information and Metadata , 2003 .

[9]  Ian Horrocks,et al.  The Semantic Web: The Roles of XML and RDF , 2000, IEEE Internet Comput..

[10]  Anand Gupta,et al.  IDENTIFICATION OF IMAGE SPAM BY USING LOW LEVEL & METADATA FEATURES , 2012 .

[11]  Gerhard Weikum,et al.  Word Sense Disambiguation for Exploiting Hierarchical Thesauri in Text Classification , 2005, PKDD.

[12]  Tony A. Meyer,et al.  SpamBayes: Effective open-source, Bayesian based, email classification system , 2004, CEAS.

[13]  N. C. Woods A Sobel Edge Detection Algorithm Based System for Analyzing and Classifying Image Based Spam , 2012 .

[14]  Dennis McLeod,et al.  A Comparative Study for Email Classification , 2007 .

[15]  Rey-Long Liu Dynamic category profiling for text filtering and classification , 2007, Inf. Process. Manag..

[16]  Tom Fawcett,et al.  "In vivo" spam filtering: a challenge problem for KDD , 2003, SKDD.

[17]  George Karypis,et al.  Weight Adjustment Schemes for a Centroid Based Classifier , 2000 .

[18]  Steffen Staab,et al.  Ontologies improve text document clustering , 2003, Third IEEE International Conference on Data Mining.

[19]  Gordon V. Cormack,et al.  TREC 2006 Spam Track Overview , 2006, TREC.

[20]  Eric A. Brewer,et al.  NinjaMail: the design of a high-performance clustered, distributed e-mail system , 2000, Proceedings 2000. International Workshop on Parallel Processing.

[21]  Steffen Staab,et al.  Ontology Learning for the Semantic Web , 2002, IEEE Intell. Syst..

[22]  Dennis McLeod,et al.  Efficient Spam Email Filtering using Adaptive Ontology , 2007, Fourth International Conference on Information Technology (ITNG'07).

[23]  Owen Kufandirimbwa,et al.  Spam Detection Using Artificial Neural Networks (Perceptron Learning Rule) , 2012 .

[24]  Gordon V. Cormack,et al.  Online supervised spam filter evaluation , 2007, TOIS.

[25]  William W. Cohen Learning Rules that Classify E-Mail , 1996 .