A Spam Filtering Method Based on Multi-modal Features Fusion

In recent years, to escape the spam detection of the text-based spam filtering system, spammers insert junk information into the email with images, and attach it to the message body. The traditional text-based filter cannot handle such spam image. In order to deal with the spam which contains text and images, a filtering method which fuses text, image and other multi-modal features is proposed in this paper. Firstly, extracting the text features and image features to build multiple classifiers, and then by employing the fusion method to choose the output of multiple classifier. Experimental results on TREC dataset show that the fusion method can have a better result than that of a single classifier and can achieve over 90% in accuracy rate.

[1]  John C. Platt,et al.  Fast training of support vector machines using sequential minimal optimization, advances in kernel methods , 1999 .

[2]  Cong Wang,et al.  Statistical Machine Learning Used in Integrated Anti-Spam System , 2007, 2007 International Conference on Machine Learning and Cybernetics.

[3]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[4]  Susan T. Dumais,et al.  A Bayesian Approach to Filtering Junk E-Mail , 1998, AAAI 1998.

[5]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[6]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[7]  Vladimir Cherkassky,et al.  The Nature Of Statistical Learning Theory , 1997, IEEE Trans. Neural Networks.

[8]  James A. Herson,et al.  Image analysis for efficient categorization of image-based spam e-mail , 2005, Eighth International Conference on Document Analysis and Recognition (ICDAR'05).

[9]  H.F. Ahmad,et al.  Using a probable weight based Bayesian approach for spam filtering , 2004, 8th International Multitopic Conference, 2004. Proceedings of INMIC 2004..

[10]  Te-Ming Chang,et al.  A Cluster-based Approach to Filtering Spam under Skewed Class Distributions , 2007, 2007 40th Annual Hawaii International Conference on System Sciences (HICSS'07).

[11]  Robert L. Winkler,et al.  The Consensus of Subjective Probability Distributions , 1968 .

[12]  Eric P. Jiang Learning to Semantically Classify Email Messages , 2006 .

[13]  Kwang-Ting Cheng,et al.  Using visual features for anti-spam filtering , 2005, IEEE International Conference on Image Processing 2005.

[14]  Kang Li,et al.  Towards an Ontology Driven Spam Filter , 2006, 22nd International Conference on Data Engineering Workshops (ICDEW'06).

[15]  GeunSik Jo,et al.  User Action Based Adaptive Learning with Weighted Bayesian Classification for Filtering Spam Mail , 2006, Australian Conference on Artificial Intelligence.

[16]  Jun Guo,et al.  An Approach to Spam Detection by Naive Bayes Ensemble Based on Decision Induction , 2006, Sixth International Conference on Intelligent Systems Design and Applications.

[17]  Minoru Sasaki,et al.  Spam detection using text clustering , 2005, 2005 International Conference on Cyberworlds (CW'05).

[18]  Fabio Roli,et al.  Spam Filtering Based On The Analysis Of Text Information Embedded Into Images , 2006, J. Mach. Learn. Res..

[19]  Jon Atli Benediktsson,et al.  Consensus theoretic classification methods , 1992, IEEE Trans. Syst. Man Cybern..

[20]  Lu Yu ANALYSIS AND CONSTRUCTION OF WORD WEIGHING FUNCTION IN VSM , 2002 .

[21]  Chi Ning Application of semantics comprehension on natural language in anti-spam , 2006 .