A survey and experimental evaluation of image spam filtering techniques

In their arms race against developers of spam filters, spammers have recently introduced the image spam trick to make the analysis of emails' body text ineffective. It consists in embedding the spam message into an attached image, which is often randomly modified to evade signature-based detection, and obfuscated to prevent text recognition by OCR tools. Detecting image spam turns out to be an interesting instance of the problem of content-based filtering of multimedia data in adversarial environments, which is gaining increasing relevance in several applications and media. In this paper we give a comprehensive survey and categorisation of computer vision and pattern recognition techniques proposed so far against image spam, and make an experimental analysis and comparison of some of them on real, publicly available data sets.

[1]  Mark Dredze,et al.  Learning Fast Classifiers for Image Spam , 2007, CEAS.

[2]  Gary Robinson,et al.  A statistical approach to the spam problem , 2003 .

[3]  Weiming Hu,et al.  Image spam filtering using Fourier-Mellin invariant features , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[4]  T. Tabata,et al.  Design and Evaluation of a Bayesian-filter-based Image Spam Filtering Method , 2008, 2008 International Conference on Information Security and Assurance (isa 2008).

[5]  Ghassan Kanaan,et al.  Feature Selection using Particle Swarm Optimization Algorithm , 2013 .

[6]  Tony A. Meyer,et al.  SpamBayes: Effective open-source, Bayesian based, email classification system , 2004, CEAS.

[7]  Wei Zheng,et al.  A Simple Method for Filtering Image Spam , 2009, 2009 Eighth IEEE/ACIS International Conference on Computer and Information Science.

[8]  Constantine D. Spyropoulos,et al.  An experimental comparison of naive Bayesian and keyword-based anti-spam filtering with personal e-mail messages , 2000, SIGIR '00.

[9]  Tom Fawcett,et al.  "In vivo" spam filtering: a challenge problem for KDD , 2003, SKDD.

[10]  Bhaskar Mehta,et al.  Detecting image spam using visual features and near duplicate detection , 2008, WWW.

[11]  A. Gupta,et al.  A Bayesian Approach to , 1997 .

[12]  Zhaoyang Qu,et al.  A New Near-Duplicate Detection System Using Object Semantics for Filtering Image Spam , 2009, 2009 International Conference on Information Management, Innovation Management and Industrial Engineering.

[13]  Calton Pu,et al.  A Discriminative Classifier Learning Approach to Image Modeling and Spam Image Identification , 2007, CEAS.

[14]  Jiri Matas,et al.  On Combining Classifiers , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[15]  James A. Herson,et al.  Image analysis for efficient categorization of image-based spam e-mail , 2005, Eighth International Conference on Document Analysis and Recognition (ICDAR'05).

[16]  Toshihiro Yamauchi Design and Evaluation of a Bayesian-filter-based Image Spam Filtering Method , 2008 .

[17]  Zhe Wang,et al.  Filtering Image Spam with Near-Duplicate Detection , 2007, CEAS.

[18]  D. Pelli,et al.  Feature detection and letter identification , 2006, Vision Research.

[19]  Qiao Liu,et al.  Efficient Modeling of Spam Images , 2010, 2010 Third International Symposium on Intelligent Information Technology and Security Informatics.

[20]  Ming Yang,et al.  Image spam hunter , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[21]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[22]  Gordon V. Cormack,et al.  TREC 2006 Spam Track Overview , 2006, TREC.

[23]  Lauren Weinstein Spam wars , 2003, CACM.

[24]  Harris Drucker,et al.  Support vector machines for spam categorization , 1999, IEEE Trans. Neural Networks.

[25]  Chih-Hung Wu,et al.  Robust classification for spam filtering by back-propagation neural networks using behavior-based features , 2009, Applied Intelligence.

[26]  Fabio Roli,et al.  Improving Image Spam Filtering Using Image Text Features , 2008, CEAS.

[27]  Henry Stern,et al.  A Survey of Modern Spam Tools , 2008, CEAS.

[28]  Yunfei Chen,et al.  Detecting image spam using local invariant features and pyramid match kernel , 2009, WWW '09.

[29]  Kwang-Ting Cheng,et al.  Using visual features for anti-spam filtering , 2005, IEEE International Conference on Image Processing 2005.

[30]  Henry S. Baird,et al.  BaffleText: a Human Interactive Proof , 2003, IS&T/SPIE Electronic Imaging.

[31]  David D. Lewis,et al.  Text categorization of low quality images , 1995 .

[32]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[33]  Anil Somayaji How to Win and Evolutionary Arms Race , 2004, IEEE Secur. Priv..

[34]  Susan T. Dumais,et al.  A Bayesian Approach to Filtering Junk E-Mail , 1998, AAAI 1998.

[35]  Fabio Roli,et al.  Image Spam Filtering Using Visual Information , 2007, 14th International Conference on Image Analysis and Processing (ICIAP 2007).

[36]  Ming-Syan Chen,et al.  Language-model-based detection cascade for efficient classification of image-based spam e-mail , 2009, 2009 IEEE International Conference on Multimedia and Expo.

[37]  Vidyasagar Potdar,et al.  Evaluation of spam detection and prevention frameworks for email and image spam: a state of art , 2008, iiWAS.

[38]  Fabio Roli,et al.  Spam Filtering Based On The Analysis Of Text Information Embedded Into Images , 2006, J. Mach. Learn. Res..

[39]  M. Angela Sasse,et al.  Successful multiparty audio communication over the Internet , 1998, CACM.

[40]  Irena Koprinska,et al.  Learning to classify e-mail , 2007, Inf. Sci..