Spam Filtering by Using a Compound Method of Feature Selection

Nowadays, the increase volume of Spams has been annoying for the internet users. In the recent years, the applying of machine learning techniques has attracted many researches’ attention for automatic filtering of Spams. In this article, a system of spam filtering has been presented based on Adaboost algorithm. In the proposed method, the available terms in email have been used as the basic features in classifying email issues. That is why the feature selection has an important role in effective improvement of Spam filtering In the proposed filtering system, a compound method has been used to identify related features and remove unrelated features, and the results have been tested and compared on a standard data set of Ling-Spam. Finally, to compare the obtained results, several other algorithms have been applied on the data and their results are compared with the obtained results. The results of the experiments clear the fact that this system has an acceptable efficiency about 0,983.

[1]  Walmir M. Caminhas,et al.  A review of machine learning approaches to Spam filtering , 2009, Expert Syst. Appl..

[2]  Venu Govindaraju,et al.  Review of Classifier Combination Methods , 2008, Machine Learning in Document Analysis and Recognition.

[3]  Thomas G. Dietterich Multiple Classifier Systems , 2000, Lecture Notes in Computer Science.

[4]  Shih-Wei Lin,et al.  An ensemble approach applied to classify spam e-mails , 2010, Expert Syst. Appl..

[5]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[6]  Mu-Chun Su,et al.  A neural tree and its application to spam e-mail detection , 2010, Expert Syst. Appl..

[7]  Yiming Yang,et al.  A Comparative Study on Feature Selection in Text Categorization , 1997, ICML.

[8]  Pat Langley,et al.  Selection of Relevant Features and Examples in Machine Learning , 1997, Artif. Intell..