AN INCREMENTAL LEARNING BASED FRAMEWORK FOR IMAGE SPAM FILTERING

Nowadays, an image spam i s an unsolved problem because of two reasons. One is due to the diversity of spamming tricks. The other reason is due to the evolving nature of image spam. As new spam constantly emerging, filters€ effectiveness drops over time. In this paper, we present a n effective anti-spam approach to solve the two problems. First, a novel clustering filter is proposed. By exploring the density -based clustering algorithm, the proposed filter is robust to spamming tricks. Then, we present a hierarchical framework by combining the clustering filter with other machine learning based classifiers to further improve the filtering capacity. Moreover, incremental learning mechanism is integrated to ensure the proposed framework be capable of adjusting itself to overcome new imag e spamming tricks. We evaluate the proposed framework on two public spam corpora. The experiment results show that the proposed framework achieves high precision along with low false positive rate.

[1]  Dean S. Messing,et al.  The MPEG-7 colour structure descriptor: image description using colour and local spatial information , 2001, Proceedings 2001 International Conference on Image Processing (Cat. No.01CH37205).

[2]  Fabio Roli,et al.  Spam Filtering Based On The Analysis Of Text Information Embedded Into Images , 2006, J. Mach. Learn. Res..

[3]  Tianshun Yao,et al.  An evaluation of statistical spam filtering techniques , 2004, TALIP.

[4]  Kwang-Ting Cheng,et al.  Using visual features for anti-spam filtering , 2005, IEEE International Conference on Image Processing 2005.

[5]  Enrico Blanzieri,et al.  A survey of learning-based techniques of email spam filtering , 2008, Artificial Intelligence Review.

[6]  Sanjay Chakraborty,et al.  Analysis and Study of Incremental K-Means Clustering Algorithm , 2011, Grid 2011.

[7]  Haibo He,et al.  Incremental clustering-based spam image filtering using representative images , 2011, 2011 International Conference on System science, Engineering design and Manufacturing informatization.

[8]  Henry Stern,et al.  A Survey of Modern Spam Tools , 2008, CEAS.

[9]  Özgür Ulusoy,et al.  Bilvideo-7: an MPEG-7- compatible video indexing and retrieval system , 2010 .

[10]  Gang Hua,et al.  A Comprehensive Approach to Image Spam Detection: From Server to Client Solution , 2010, IEEE Transactions on Information Forensics and Security.

[11]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[12]  Ung Mo Kim,et al.  A hierarchical framework for content-based image spam filtering , 2012, 2012 8th International Conference on Information Science and Digital Content Technology (ICIDT2012).

[13]  Basheer Al-Duwairi,et al.  Texture Analysis-Based Image Spam Filtering , 2011, 2011 International Conference for Internet Technology and Secured Transactions.

[14]  Wei-bang Chen,et al.  Identifying image spam authorship with variable bin-width histogram-based projective clustering , 2011, 2011 IEEE International Conference on Multimedia and Expo.

[15]  Reza Moradi Rad,et al.  A survey of image spamming and filtering techniques , 2011, Artificial Intelligence Review.

[16]  Zhe Wang,et al.  Filtering Image Spam with Near-Duplicate Detection , 2007, CEAS.

[17]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[18]  Chengcui Zhang,et al.  A Multimodal Data Mining Framework for Revealing Common Sources of Spam Images , 2009 .

[19]  Hiromichi Fujisawa,et al.  Forty years of research in character and document recognition - an industrial perspective , 2008, Pattern Recognit..

[20]  Fabio Roli,et al.  Image Spam Filtering by Content Obscuring Detection , 2007, CEAS.

[21]  M. Parimala,et al.  A Survey on Density Based Clustering Algorithms for Mining Large Spatial Databases , 2011 .

[22]  Carlo Sansone,et al.  Combining visual and textual features for filtering spam emails , 2008, 2008 19th International Conference on Pattern Recognition.