Detecting redirection spam using multilayer perceptron neural network

Quality information retrieval from Web is essential for every search engine. But the quality of information is being exploited by spammers who make heavy use of malicious redirections for the purpose of phishing, downloading malware or attaining high search engine ranking. Malicious redirections present the irrelevant content to search user, thereby affecting user satisfaction. It also leads to wastage of network bandwidth. In this paper, we propose a neural framework for detecting redirection spam. We incorporated the feed-forward multilayer perceptron network and used scaled conjugate gradient algorithm that is able to perform very fast classification of URLs leading to redirection spam. We investigated the network empirically to choose the number of hidden layers and observed that when network is trained with two hidden layers, it gives better accuracy. We validated our proposed approach against the dataset of 2383 URLs and were able to detect the spammed redirections with high accuracy. The results indicate that neural networks are very effective technique to model the redirection spam detection.

[1]  Brian D. Davison,et al.  Cloaking and Redirection: A Preliminary Study , 2005, AIRWeb.

[2]  J. Nazuno Haykin, Simon. Neural networks: A comprehensive foundation, Prentice Hall, Inc. Segunda Edición, 1999 , 2000 .

[3]  Laxmi Ahuja,et al.  A fuzzy logic approach for detecting redirection spam , 2016, Int. J. Electron. Secur. Digit. Forensics.

[4]  Wang Tao,et al.  A Novel Framework for Learning to Detect Malicious Web Pages , 2010, 2010 International Forum on Information Technology and Applications.

[5]  Jun Zhang,et al.  Modeling and Analysis on the Propagation Dynamics of Modern Email Malware , 2014, IEEE Transactions on Dependable and Secure Computing.

[6]  Hao Chen,et al.  Spam double-funnel: connecting web spammers with advertisers , 2007, WWW '07.

[7]  Dawn Xiaodong Song,et al.  Design and Evaluation of a Real-Time URL Spam Filtering Service , 2011, 2011 IEEE Symposium on Security and Privacy.

[8]  Antonio Nucci,et al.  Detecting malicious HTTP redirections using trees of user browsing activity , 2014, IEEE INFOCOM 2014 - IEEE Conference on Computer Communications.

[9]  Zhenhai Duan,et al.  Detecting Spam Zombies by Monitoring Outgoing Messages , 2009, IEEE INFOCOM 2009.

[10]  Alfredo De Santis,et al.  An asynchronous covert channel using spam , 2012, Comput. Math. Appl..

[11]  Xingming Sun,et al.  Effective and Efficient Global Context Verification for Image Copy Detection , 2017, IEEE Transactions on Information Forensics and Security.

[12]  Lawrence K. Saul,et al.  Beyond blacklists: learning to detect malicious web sites from suspicious URLs , 2009, KDD.

[13]  Giovanni Vigna,et al.  Prophiler: a fast filter for the large-scale detection of malicious web pages , 2011, WWW.

[14]  Rajeev Motwani,et al.  Stratified Planning , 2009, IJCAI.

[15]  Brian D. Davison,et al.  Adversarial Web Search , 2011, Found. Trends Inf. Retr..

[16]  Bin Zhao,et al.  Malicious web page detection based on on-line learning algorithm , 2011, 2011 International Conference on Machine Learning and Cybernetics.

[17]  Hao Chen,et al.  A Quantitative Study of Forum Spamming Using Context-based Analysis , 2007, NDSS.

[18]  Martin Fodslette Meiller A Scaled Conjugate Gradient Algorithm for Fast Supervised Learning , 1993 .

[19]  David Hutchison,et al.  Malware Detection in Cloud Computing Infrastructures , 2016, IEEE Transactions on Dependable and Secure Computing.

[20]  Yuxiang Wang,et al.  Construction of Tree Network with Limited Delivery Latency in Homogeneous Wireless Sensor Networks , 2014, Wirel. Pers. Commun..

[21]  Kumar Chellapilla,et al.  A taxonomy of JavaScript redirection spam , 2007, AIRWeb '07.

[22]  Laxmi Ahuja,et al.  Approaches for Web Spam Detection , 2014 .

[23]  Simon Haykin,et al.  Neural Networks: A Comprehensive Foundation , 1998 .

[24]  Naixue Xiong,et al.  Steganalysis of LSB matching using differences between nonadjacent pixels , 2016, Multimedia Tools and Applications.

[25]  Birhanu Eshete Effective analysis, characterization, and detection of malicious web pages , 2013, WWW '13 Companion.

[26]  Bin Gu,et al.  Incremental Support Vector Learning for Ordinal Regression , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[27]  Bin Gu,et al.  Incremental learning for ν-Support Vector Regression , 2015, Neural Networks.

[28]  Justin Tung Ma,et al.  Learning to detect malicious URLs , 2011, TIST.

[29]  Jong Kim,et al.  WarningBird: A Near Real-Time Detection System for Suspicious URLs in Twitter Stream , 2013, IEEE Transactions on Dependable and Secure Computing.

[30]  Fidel Cacheda,et al.  Analysing the Effectiveness of Crawlers on the Client-Side Hidden Web , 2012, PAAMS.

[31]  Wenke Lee,et al.  SURF: detecting and measuring search poisoning , 2011, CCS '11.

[32]  Fang Yu,et al.  Knowing your enemy: understanding and detecting malicious web advertising , 2012, CCS '12.

[33]  Yuta Takata,et al.  Analysis of Redirection Caused by Web-based Malware , 2011 .

[34]  Ahmed Hamza Osman,et al.  Enhancement of spam detection mechanism based on hybrid $$\varvec{k}$$k-mean clustering and support vector machine , 2015, Soft Comput..

[35]  Zhihua Xia,et al.  A Privacy-Preserving and Copy-Deterrence Content-Based Image Retrieval Scheme in Cloud Computing , 2016, IEEE Transactions on Information Forensics and Security.

[36]  Ajith Abraham,et al.  Artificial immune system inspired behavior-based anti-spam filter , 2007, Soft Comput..

[37]  Farid U. Dowla,et al.  Backpropagation Learning for Multilayer Feed-Forward Neural Networks Using the Conjugate Gradient Method , 1991, Int. J. Neural Syst..

[38]  Jong Kim,et al.  WarningBird: Detecting Suspicious URLs in Twitter Stream , 2012, NDSS.

[39]  Krishna Bhargrava,et al.  A Study of URL Redirection Indicating Spam , 2009 .

[40]  Ling Shao,et al.  A rapid learning algorithm for vehicle classification , 2015, Inf. Sci..

[41]  Kouichi Sakurai,et al.  Proactive Blacklisting for Malicious Web Sites by Reputation Evaluation Based on Domain and IP Address Registration , 2011, 2011IEEE 10th International Conference on Trust, Security and Privacy in Computing and Communications.

[42]  Hector Garcia-Molina,et al.  Spam: it's not just for inboxes anymore , 2005, Computer.

[43]  Yuhui Zheng,et al.  Image segmentation by generalized hierarchical fuzzy C-means algorithm , 2015, J. Intell. Fuzzy Syst..

[44]  Mark Beale,et al.  Neural Network Toolbox™ User's Guide , 2015 .

[45]  Zhihua Xia,et al.  Steganalysis of least significant bit matching using multi-order differences , 2014, Secur. Commun. Networks.

[46]  Jingyu Zhou,et al.  An Analysis of URLs Generated from JavaScript Code , 2012, 2012 IEEE/ACIS 11th International Conference on Computer and Information Science.

[47]  Ming Ma,et al.  Strider Search Ranger: Towards an Autonomic Anti-Spam Search Engine , 2007, Fourth International Conference on Autonomic Computing (ICAC'07).

[48]  Ying Tan,et al.  A three-layer back-propagation neural network for spam detection using artificial immune concentration , 2009, Soft Comput..

[49]  Tyler Moore,et al.  Measuring and Analyzing Search-Redirection Attacks in the Illicit Online Prescription Drug Trade , 2011, USENIX Security Symposium.