Detecting Web Spams Using Evidence Theory

Search engines are the major instruments on the Web. The determination of the liability of the results returned by a typical search engine is a daunting challenge mainly due to the presence of Web spams. New types of Web spams are continuously introduced every now and then, which makes it drastically challenging to decide about the accuracy of the results. The problem looks like a reasoning problem in the presence of uncertainty. This paper presents a methodology for predicting Web spam where the spamicity of hosts is formulated as a reasoning problem. The approach is based on evidence theory, a mathematical prediction model based on Dempster-Shafer Theory (DST). The key benefit of our approach for Web spam is DST's ability to deal with the uncertainty. When a new spam is introduced in the system, the system lacks a reasonable prior knowledge. This is where DST provides more liable solution to detect spams without any prior information. The paper presents detailed statistical evaluations of the proposed approach where an accuracy of 99.27% in detecting Web spams is reported.

[1]  Sara Sartoli,et al.  Poster: Reasoning Based on Imperfect Context Data in Adaptive Security , 2015, 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering.

[2]  Sara Sartoli,et al.  Adaptive Reasoning in the Presence of Imperfect Security Requirements , 2016, 2016 IEEE 40th Annual Computer Software and Applications Conference (COMPSAC).

[3]  Sara Sartoli,et al.  Adaptive Reasoning for Context-Sensitive Access Controls , 2016, 2016 IEEE 40th Annual Computer Software and Applications Conference (COMPSAC).

[4]  András A. Benczúr,et al.  Web spam filtering in internet archives , 2009, AIRWeb '09.

[5]  Mansour Alsaleh,et al.  Analysis of Web Spam for Non-English Content: Toward More Effective Language-Based Classifiers , 2016, PloS one.

[6]  Akbar Siami Namin,et al.  The Impact of Address Changes and Host Diversity on the Effectiveness of Moving Target Defense Strategy , 2016, 2016 IEEE 40th Annual Computer Software and Applications Conference (COMPSAC).

[7]  R. Yager On the dempster-shafer framework and new combination rules , 1987, Inf. Sci..

[8]  Juan Martínez-Romo,et al.  Web spam identification through language model analysis , 2009, AIRWeb '09.

[9]  András A. Benczúr,et al.  Temporal Analysis for Web Spam Detection: An Overview , 2011, TWAW.

[10]  Mieczyslaw A. Klopotek,et al.  Mathematical Theory of Evidence Versus Evidence , 2018, ArXiv.

[11]  Yong Chen,et al.  Log-Assisted Straggler-Aware I/O Scheduler for High-End Computing , 2016, 2016 45th International Conference on Parallel Processing Workshops (ICPPW).

[12]  Akbar Siami Namin,et al.  Continuous Authentications Using Frequent English Terms , 2018, Appl. Artif. Intell..

[13]  Akbar Siami Namin,et al.  Forecasting Economics and Financial Time Series: ARIMA vs. LSTM , 2018, ArXiv.

[14]  Sara Sartoli,et al.  A semantic model for action-based adaptive security , 2017, SAC.