Image spam filtering using convolutional neural networks

Spammers often embed text into images in order to avoid filtering by text-based spam filters, which result in a large number of advertisement spam images. Garbage image recognition has become one of the hotspots in the field of Internet spam filtering research. Its goal is to solve the problem that traditional spam information filtering methods encounter a sharp performance decline or even failure when filtering spam image information. Based on the clustering algorithm, this paper proposes a method to expand the data samples, which greatly improves the number of high-quality training samples and meets the needs of model training. Then, we train a convolutional neural networks using the enlarged data samples to recognize the SPAM in real time. The experimental results show that the accuracy of the model is increased by more than 14% after using the method of data augmentation. The accuracy of the model can be improved by 6% compared with other methods of data augmentation. Combined with convolutional neural networks and the proposed method of data augmentation, the accuracy of our SPAM filtering model is 7---11% higher than that of the traditional method.

[1]  Martín Abadi,et al.  TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems , 2016, ArXiv.

[2]  Mark Dredze,et al.  Learning Fast Classifiers for Image Spam , 2007, CEAS.

[3]  Yudong Zhang,et al.  Automated classification of brain images using wavelet-energy and biogeography-based optimization , 2016, Multimedia Tools and Applications.

[4]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[5]  Qiao Liu,et al.  Feature selection for image spam classification , 2010, 2010 International Conference on Communications, Circuits and Systems (ICCCAS).

[6]  V. Gowri,et al.  Image Spam Detection through Server-Client Filtering by Tracing the Source IP of the Spammer , 2012 .

[7]  Jean Ponce,et al.  A Theoretical Analysis of Feature Pooling in Visual Recognition , 2010, ICML.

[8]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[9]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[10]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[11]  John Salvatier,et al.  Theano: A Python framework for fast computation of mathematical expressions , 2016, ArXiv.

[12]  Rob Fergus,et al.  Stochastic Pooling for Regularization of Deep Convolutional Neural Networks , 2013, ICLR.

[13]  Trevor Darrell,et al.  DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition , 2013, ICML.

[14]  Xiang Zhang,et al.  OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks , 2013, ICLR.

[15]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[16]  Lize Gu,et al.  Real-time image recognition using weighted spatial pyramid networks , 2017, Journal of Real-Time Image Processing.

[17]  Rob Fergus,et al.  Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[18]  Yoram Singer,et al.  Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..

[19]  Matthew D. Zeiler ADADELTA: An Adaptive Learning Rate Method , 2012, ArXiv.

[20]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[21]  Luc Van Gool,et al.  The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[22]  Koen E. A. van de Sande,et al.  Selective Search for Object Recognition , 2013, International Journal of Computer Vision.

[23]  Robert H. Deng,et al.  On robust image spam filtering via comprehensive visual modeling , 2015, Pattern Recognit..

[24]  Yudong Zhang,et al.  Binary PSO with mutation operator for feature selection using decision tree applied to spam detection , 2014, Knowl. Based Syst..

[25]  Bala Srinivasan,et al.  Distributed classification for image spam detection , 2017, Multimedia Tools and Applications.

[26]  Yixian Yang,et al.  Weighted pooling for image recognition of deep convolutional neural networks , 2018, Cluster Computing.

[27]  Yasuo Kuniyoshi,et al.  Discriminative spatial pyramid , 2011, CVPR 2011.

[28]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Reza Moradi Rad,et al.  A survey of image spamming and filtering techniques , 2011, Artificial Intelligence Review.

[30]  Jian Sun,et al.  Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition , 2015, IEEE Trans. Pattern Anal. Mach. Intell..

[31]  Kyung-Ah Sohn,et al.  Graph-Based Spam Image Detection for Mobile Phone Spam Image Filtering , 2015, Int. J. Softw. Innov..

[32]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).