A DBN-Based Classifying Approach to Discover the Internet Water Army

The Internet water army (IWA) usually refers to hidden paid posters and collusive spammers, which has already generated big threats for cyber security. Many researchers begin to study how to effectively identify the IWA. Currently, most efforts to distinguish non-IWA and IWA in data mining context focus on utilizing classification-based algorithms, including Bayesian Network, SVM, KNN and etc... However, Bayesian Network need strong conditional independence assumption, KNN has big computation costs, above approach may affect the effectiveness to some extent in real industrial applications. Hence, Neural Networks-like deep approach for IWA identification gradually becomes an emerging but possible direction and attempt. Unfortunately, there also exists one main problem, which is how to balance the deep learning and computation costs in hierarchical architecture. More specially, combine leaning-level heuristic training design and computing-level concurrent computation is a challenging issue. In this paper, we propose a collaborative hierarchical approach based on the deep belief network (DBN) for IWA identification. Firstly, a DBN-based collaborative model with hierarchical classifying mechanism is built. Then towards Hadoop platform, the Downpour Stochastic gradient descent (Downpour SGD) is exploited for DBN pre-training. Finally, the dynamical workflow will be designed for managing the whole learning-based classifying process. The experimental evaluation shows that the valid of our approach.

[1]  Harris Drucker,et al.  Support vector machines for spam categorization , 1999, IEEE Trans. Neural Networks.

[2]  Li Xing Implementation and evaluation of Chinese spam filtering system , 2005 .

[3]  Nick Feamster,et al.  Understanding the network-level behavior of spammers , 2006, SIGCOMM 2006.

[4]  Dino Pedreschi,et al.  Knowledge Discovery in Databases: PKDD 2004 , 2004, Lecture Notes in Computer Science.

[5]  Hongbo Wang,et al.  Development of Motor Controller Based on PIC , 2007, 2007 International Conference on Convergence Information Technology (ICCIT 2007).

[6]  Salvatore J. Stolfo,et al.  Behavior-based email analysis with application to spam detection , 2006 .

[7]  Georgios Paliouras,et al.  A Memory-Based Approach to Anti-Spam Filtering for Mailing Lists , 2004, Information Retrieval.

[8]  Raymond Y. K. Lau,et al.  Text mining and probabilistic language modeling for online review spam detecting , 2011 .

[9]  William S. Yerazunis,et al.  Combining Winnow and Orthogonal Sparse Bigrams for Incremental Spam Filtering , 2004, PKDD.

[10]  Zongben Xu,et al.  When Does Online BP Training Converge? , 2009, IEEE Transactions on Neural Networks.

[11]  Du Nan-shan Design and Analysis of Spam-Filtering System Based on Words Segmentation , 2005 .

[12]  Yoram Singer,et al.  Boosting and Rocchio applied to text filtering , 1998, SIGIR '98.

[13]  Raymond Y. K. Lau,et al.  Text mining and probabilistic language modeling for online review spam detection , 2012, TMIS.

[14]  Li Wei,et al.  Spam Filtering by Stages , 2007, 2007 International Conference on Convergence Information Technology (ICCIT 2007).

[15]  Yoram Singer,et al.  BoosTexter: A Boosting-based System for Text Categorization , 2000, Machine Learning.

[16]  Christopher G. Harris Detecting Deceptive Opinion Spam Using Human Computation , 2012, HCOMP@AAAI.

[17]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[18]  Walmir M. Caminhas,et al.  A review of machine learning approaches to Spam filtering , 2009, Expert Syst. Appl..

[19]  Irena Koprinska,et al.  A neural network based approach to automated e-mail classification , 2003, Proceedings IEEE/WIC International Conference on Web Intelligence (WI 2003).

[20]  Ying Tan,et al.  A three-layer back-propagation neural network for spam detection using artificial immune concentration , 2009, Soft Comput..

[21]  Hai Zhao,et al.  Using Deep Linguistic Features for Finding Deceptive Opinion Spam , 2012, COLING.

[22]  Chi-Yuan Yeh,et al.  Effective spam classification based on meta-heuristics , 2005, 2005 IEEE International Conference on Systems, Man and Cybernetics.

[23]  Archana Bhattarai,et al.  Characterizing comment spam in the blogosphere through content analysis , 2009, 2009 IEEE Symposium on Computational Intelligence in Cyber Security.