An Efficient Method for Filtering Image-Based Spam E-mail

Spam e-mail with advertisement text embedded in images presents a great challenge to anti-spam filters. In this paper, we present a fast method to detect image-based spam e-mail. Using simple edge-based features, the method computes a vector of similarity scores between an image and a set of templates. This similarity vector is then used with support vector machines to separate spam images from other common categories of images. Our method does not require expensive OCR or even text extraction from images. Empirical results show that the method is fast and has good classification accuracy.

[1]  Jianying Hu,et al.  Categorizing images in Web documents , 2004, IEEE MultiMedia.

[2]  Wolfgang Effelsberg,et al.  Automatic text segmentation and text recognition for video indexing , 2000, Multimedia Systems.

[3]  James A. Herson,et al.  Image analysis for efficient categorization of image-based spam e-mail , 2005, Eighth International Conference on Document Analysis and Recognition (ICDAR'05).

[4]  David S. Doermann,et al.  Automatic text detection and tracking in digital video , 2000, IEEE Trans. Image Process..

[5]  Martin Szummer,et al.  Indoor-outdoor image classification , 1998, Proceedings 1998 IEEE International Workshop on Content-Based Access of Image and Video Database.

[6]  Fabio Roli,et al.  Spam Filtering Based On The Analysis Of Text Information Embedded Into Images , 2006, J. Mach. Learn. Res..

[7]  Jamshid Shanbehzadeh,et al.  Image retrieval based on shape similarity by edge orientation autocorrelogram , 2003, Pattern Recognit..

[8]  Anil K. Jain,et al.  Shape-Based Retrieval: A Case Study With Trademark Image Databases , 1998, Pattern Recognit..

[9]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[10]  Masayuki Nakajima,et al.  Image Categorization using Color Blobs in a Mobile Environment , 2003, Comput. Graph. Forum.

[11]  Anil K. Jain,et al.  Image retrieval using color and shape , 1996, Pattern Recognit..