A Fuzzy Adaptive Multi-Population Parallel Genetic Algorithm for Spam Filtering

Nowadays, e-mail is one of the most inexpensive and expeditious means of communication. However, a principal problem of any internet user is the increasing number of spam, and therefore an efficient spam filtering method is imperative. Feature selection is one of the most important factors, which can influence the classification accuracy rate. To improve the performance of spam prediction, this paper proposes a new fuzzy adaptive multi-population parallel genetic algorithm (FAMGA) for feature selection. To maintain the diversity of population, a few studies of multi-swarm strategy are reported, whereas the dynamic parameter setting has not been considered further. The proposed method is based on multiple subpopulations and each subpopulation runs in independent memory space. For the purpose of controlling the subpopulations adaptively, we put forward two regulation strategies, namely population adjustment and subpopulation adjustment. In subpopulation adjustment, a controller is designed to adjust the crossover rate for each subpopulation, and in population adjustment, a controller is designed to adjust the size of each subpopulation. Three publicly available benchmark corpora for spam filtering, the PU1, Ling-Spam and SpamAssassin, are used in our experiments. The results of experiments show that the proposed method improves the performance of spam filtering, and is significantly better than other feature selection methods. Thus, it is proved that the multi-population regulation strategy can find the optimal feature subset, and prevent premature convergence of the population.

[1]  Wei-Chih Hsu,et al.  E-mail Spam Filtering Based on Support Vector Machines with Taguchi Method for Parameter Selection , 2010, J. Convergence Inf. Technol..

[2]  G. Clark,et al.  Reference , 2008 .

[3]  Shutao Li,et al.  Gene selection using genetic algorithm and support vectors machines , 2008, Soft Comput..

[4]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[5]  Yongming Li,et al.  Sequential multi-criteria feature selection algorithm based on agent genetic algorithm , 2008, Applied Intelligence.

[6]  Seyed Hessameddin Zegordi,et al.  A Multi-Population Genetic Algorithm for Transportation Scheduling , 2009 .

[7]  Georgia Koutrika,et al.  Fighting Spam on Social Web Sites: A Survey of Approaches and Future Challenges , 2007, IEEE Internet Computing.

[8]  S.N. Singh,et al.  Fuzzy Adaptive Particle Swarm Optimization for Bidding Strategy in Uniform Price Spot Market , 2007, IEEE Transactions on Power Systems.

[9]  Kenneth A. De Jong,et al.  Cooperative Coevolution: An Architecture for Evolving Coadapted Subcomponents , 2000, Evolutionary Computation.

[10]  Michio Sugeno,et al.  Fuzzy identification of systems and its applications to modeling and control , 1985, IEEE Transactions on Systems, Man, and Cybernetics.

[11]  Ke Gao,et al.  SpamCooling: A Parallel Heterogeneous Ensemble Spam Filtering System Based on Active Learning Techniques , 2010, J. Convergence Inf. Technol..

[12]  Yongming Li,et al.  Research of multi-population agent genetic algorithm for feature selection , 2009, Expert Syst. Appl..

[13]  Bernhard E. Boser,et al.  A training algorithm for optimal margin classifiers , 1992, COLT '92.

[14]  Antonio González Muñoz,et al.  Table Ii Tc Pattern Recognition Result for 120 Eir Satellite Image Cases Selection of Relevant Features in a Fuzzy Genetic Learning Algorithm , 2001 .

[15]  Enrico Blanzieri,et al.  A survey of learning-based techniques of email spam filtering , 2008, Artificial Intelligence Review.

[16]  Francisco Herrera,et al.  Adaptive genetic operators based on coevolution with fuzzy behaviors , 2001, IEEE Trans. Evol. Comput..

[17]  John W. Fowler,et al.  A multi-population genetic algorithm to solve multi-objective scheduling problems for parallel machines , 2003, Comput. Oper. Res..

[18]  Jihoon Yang,et al.  Feature Subset Selection Using a Genetic Algorithm , 1998, IEEE Intell. Syst..

[19]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.