Incremental clustering-based spam image filtering using representative images

In this paper, an incremental spam images filtering (ISIF) approach based on visual similarity is proposed as one solution to two important realistic problems not dealt well by the existing spam image filtering techniques. One problem is how to update a model efficiently. Another is how to deal with the lack of normal email images. The basic idea of the ISIF approach is to incrementally learn what spam images look like through clustering spam images and selecting their representative images (RI), and then use the RI to classify unknown images. An ISIF filter can be updated by adding new RI, which can be done efficiently because the retraining process only focuses on the missed spam images rather than on expanded training data. Since the ISIF approach only cares about spam images, it avoids the difficulty of collecting enough normal email images. The experimental results on a real dataset for spam image filtering problem show that the incremental filter based on the ISIF approach can effectively detect spam images with high accuracy along with low false positive rate.

[1]  Zhe Wang,et al.  Filtering Image Spam with Near-Duplicate Detection , 2007, CEAS.

[2]  Reza Moradi Rad,et al.  A survey of image spamming and filtering techniques , 2011, Artificial Intelligence Review.

[3]  Mark Dredze,et al.  Learning Fast Classifiers for Image Spam , 2007, CEAS.

[4]  Bhaskar Mehta,et al.  Detecting image spam using visual features and near duplicate detection , 2008, WWW.

[5]  Ming Yang,et al.  Image spam hunter , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[6]  Zhaoyang Qu,et al.  Filtering Image Spam Using Image Semantics and Near-Duplicate Detection , 2009, 2009 Second International Conference on Intelligent Computation Technology and Automation.

[7]  Dongwon Lee,et al.  BASIL: Effective Near-Duplicate Image Detection Using Gene Sequence Alignment , 2010, ECIR.

[8]  Tu Minh Phuong,et al.  An Efficient Method for Filtering Image-Based Spam E-mail , 2007, CAIP.

[9]  Gang Hua,et al.  A nonnegative sparsity induced similarity measure with application to cluster analysis of spam images , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[10]  Qiao Liu,et al.  Spam image discrimination using support vector machine based on higher-order local autocorrelation feature extraction , 2008, 2008 IEEE Conference on Cybernetics and Intelligent Systems.

[11]  Yunfei Chen,et al.  Detecting image spam using local invariant features and pyramid match kernel , 2009, WWW '09.