Uninterrupted Approaches for Spam Detection Based on SVM and AIS

The proliferation of spam has increasingly caused serious problems to our daily electronic communications. Jupiter Research estimates an e-mail user will on average receive more than 3,900 spam mails in 2007. This number is on average of 40 spam mails in 1999. This paper proposes uninterrupted detection approaches based on Incremental Support Vector Machine and Artiϐicial Immune System for the spam of e-mail stream. These approaches use a window to hold several classiϐiers, each one classiϐies the e-mail independently, and the e-mail is labeled according to a majority voting strategy. The exceeding margin update technique of support vector machine (SVM) is also used for the dynamic update of each classiϐier in the window. A sliding window is used to purge out-of-date knowledge. When a new batch of e-mail arrives, the classiϐier at rightmost in the window is removed from the window while the remaining classiϐiers just slide a position to right and the classiϐier at leftmost is newly generated by the previous batch. These two techniques endow our algorithms with dynamic and adaptive properties as well as the ability to trace the changing content of e-mails and user's interests in an uninterrupted way. Eight methods, possibly regarded as different implementations of the uninterrupted detection of e-mail stream, including Hamming Distance (with and without mutation), Included Angle, SVM and Weighted Voting, are developed and elaborated in this paper. Experiments on two public benchmark corpora PU1 and Ling are conducted to verify the validity of the proposed methods. The eight methods are compared with current methods for accuracy, precision, recall, miss rate and speed of detection. The results demonstrate that the proposed uninterrupted detection approaches are promising way to contain spam.

[1]  Yaping Lin,et al.  Improved Bayesian Spam Filtering Based on Co-weighted Multi-area Information , 2005, PAKDD.

[2]  Jon Postel,et al.  On the junk mail problem , 1975, RFC.

[3]  Fernando José Von Zuben,et al.  An Immunological Filter for Spam , 2006, ICARIS.

[4]  Nathaniel S. Borenstein,et al.  A Multifaceted Approach to Spam Reduction , 2004, CEAS.

[5]  J. Platt Sequential Minimal Optimization : A Fast Algorithm for Training Support Vector Machines , 1998 .

[6]  Georgios Paliouras,et al.  An evaluation of Naive Bayesian anti-spam filtering , 2000, ArXiv.

[7]  Jun Wang,et al.  A support vector machine with a hybrid kernel and minimal Vapnik-Chervonenkis dimension , 2004, IEEE Transactions on Knowledge and Data Engineering.

[8]  Tony White,et al.  Immunity from Spam: An Analysis of an Artificial Immune System for Junk Email Detection , 2005, ICARIS.

[9]  C. A. Murthy,et al.  Data condensation in large databases by incremental learning with support vector machines , 2000, Proceedings 15th International Conference on Pattern Recognition. ICPR-2000.

[10]  Alexander J. Smola,et al.  Support Vector Regression Machines , 1996, NIPS.

[11]  John Ioannidis Fighting Spam by Encapsulating Policy in Email Addresses , 2003, NDSS.

[12]  Chi-Yuan Yeh,et al.  Effective spam classification based on meta-heuristics , 2005, 2005 IEEE International Conference on Systems, Man and Cybernetics.

[13]  Dimitrios Gunopulos,et al.  Incremental support vector machine construction , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[14]  Tony White,et al.  Increasing the accuracy of a spam-detecting artificial immune system , 2003, The 2003 Congress on Evolutionary Computation, 2003. CEC '03..

[15]  Marcio Henrique Zuchini,et al.  Aplicações de mapas auto-organizaveis em mineração de dados e recuperação de informação , 2003 .

[16]  Vladimir Vapnik,et al.  Estimation of Dependences Based on Empirical Data: Springer Series in Statistics (Springer Series in Statistics) , 1982 .

[17]  Li Guo,et al.  Research of a Novel Anti-Spam Technique Based on Users’ Feedback and Improved Naive Bayesian Approach , 2006, International conference on Networking and Services (ICNS'06).

[18]  Isidore Rigoutsos,et al.  Chung-Kwei: a Pattern-discovery-based System for the Automatic Identification of Unsolicited E-mail Messages (SPAM) , 2004, CEAS.

[19]  Tony White,et al.  Developing an Immunity to Spam , 2003, GECCO.

[20]  Constantine D. Spyropoulos,et al.  An experimental comparison of naive Bayesian and keyword-based anti-spam filtering with personal e-mail messages , 2000, SIGIR '00.

[21]  Ying Tan Multiple-Point Bit Mutation Method of Detector Generation for SNSD Model , 2006, ISNN.

[22]  Harris Drucker,et al.  Support vector machines for spam categorization , 1999, IEEE Trans. Neural Networks.

[23]  Jun Wang,et al.  Neural network realization of support vector methods for pattern classification , 2000, Proceedings of the IEEE-INNS-ENNS International Joint Conference on Neural Networks. IJCNN 2000. Neural Computing: New Challenges and Perspectives for the New Millennium.

[24]  Susan T. Dumais,et al.  A Bayesian Approach to Filtering Junk E-Mail , 1998, AAAI 1998.

[25]  Stephanie Forrest,et al.  Architecture for an Artificial Immune System , 2000, Evolutionary Computation.

[26]  Huan Liu,et al.  Handling concept drifts in incremental learning with support vector machines , 1999, KDD '99.

[27]  Y. Tan,et al.  Clonal particle swarm optimization and its applications , 2007, 2007 IEEE Congress on Evolutionary Computation.

[28]  D. Dasgupta,et al.  Immunity-based systems: a survey , 1997, 1997 IEEE International Conference on Systems, Man, and Cybernetics. Computational Cybernetics and Simulation.

[29]  John Langford,et al.  Telling humans and computers apart automatically , 2004, CACM.

[30]  Irena Koprinska,et al.  Learning to classify e-mail , 2007, Inf. Sci..

[31]  Bogdan Hoanca,et al.  How good are our weapons in the spam wars? , 2006, IEEE Technology and Society Magazine.

[32]  Federico Girosi,et al.  An improved training algorithm for support vector machines , 1997, Neural Networks for Signal Processing VII. Proceedings of the 1997 IEEE Signal Processing Society Workshop.

[33]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[34]  Alex Alves Freitas,et al.  AISEC: an artificial immune system for e-mail classification , 2003, IEEE Congress on Evolutionary Computation.

[35]  Sung-Hyuk Cha,et al.  A Neural Network Classifier for Junk E-Mail , 2004, Document Analysis Systems.

[36]  Shyue-Kung Lu,et al.  A multi-faceted approach towards spam-resistible mail , 2005, 11th Pacific Rim International Symposium on Dependable Computing (PRDC'05).

[37]  Irena Koprinska,et al.  A neural network based approach to automated e-mail classification , 2003, Proceedings IEEE/WIC International Conference on Web Intelligence (WI 2003).

[38]  Zhisheng You,et al.  Immune-Based Peer-to-Peer Model for Anti-spam , 2006, ICIC.

[39]  H.F. Ahmad,et al.  Using a probable weight based Bayesian approach for spam filtering , 2004, 8th International Multitopic Conference, 2004. Proceedings of INMIC 2004..

[40]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.