Search engine optimization (SEO) techniques are often abused to promote websites among search results. This is a practice known as blackhat SEO. In this paper we tackle a newly emerging and especially aggressive class of blackhat SEO, namely search poisoning. Unlike other blackhat SEO techniques, which typically attempt to promote a website's ranking only under a limited set of search keywords relevant to the website's content, search poisoning techniques disregard any term relevance constraint and are employed to poison popular search keywords with the sole purpose of diverting large numbers of users to short-lived traffic-hungry websites for malicious purposes. To accurately detect search poisoning cases, we designed a novel detection system called SURF. SURF runs as a browser component to extract a number of robust (i.e., difficult to evade) detection features from search-then-visit browsing sessions, and is able to accurately classify malicious search user redirections resulted from user clicking on poisoned search results. Our evaluation on real-world search poisoning instances shows that SURF can achieve a detection rate of 99.1% at a false positive rate of 0.9%. Furthermore, we applied SURF to analyze a large dataset of search-related browsing sessions collected over a period of seven months starting in September 2010. Through this long-term measurement study we were able to reveal new trends and interesting patterns related to a great variety of poisoning cases, thus contributing to a better understanding of the prevalence and gravity of the search poisoning problem.
[1]
Vinod Yegneswaran,et al.
BLADE: an attack-agnostic approach for preventing drive-by malware infections
,
2010,
CCS '10.
[2]
Thomas Lavergne,et al.
Tracking Web spam with HTML style similarities
,
2008,
TWEB.
[3]
Brian D. Davison,et al.
Identifying link farm spam pages
,
2005,
WWW '05.
[4]
Brian D. Davison,et al.
Detecting semantic cloaking on the web
,
2006,
WWW '06.
[5]
Hector Garcia-Molina,et al.
Web Spam Taxonomy
,
2005,
AIRWeb.
[6]
Martín Abadi,et al.
deSEO: Combating Search-Result Poisoning
,
2011,
USENIX Security Symposium.
[7]
Ian H. Witten,et al.
The WEKA data mining software: an update
,
2009,
SKDD.
[8]
Rajeev Motwani,et al.
The PageRank Citation Ranking : Bringing Order to the Web
,
1999,
WWW 1999.
[9]
Marc Najork,et al.
Detecting spam web pages through content analysis
,
2006,
WWW '06.
[10]
Dawn Xiaodong Song,et al.
Design and Evaluation of a Real-Time URL Spam Filtering Service
,
2011,
2011 IEEE Symposium on Security and Privacy.
[11]
Xin Zhao,et al.
The Nocebo Effect on the Web: An Analysis of Fake Anti-Virus Distribution
,
2010,
LEET.
[12]
J. Ross Quinlan,et al.
C4.5: Programs for Machine Learning
,
1992
.
[13]
Steven D. Gribble,et al.
A Crawler-based Study of Spyware in the Web
,
2006,
NDSS.
[14]
Xuxian Jiang,et al.
Automated Web Patrol with Strider HoneyMonkeys: Finding Web Sites That Exploit Browser Vulnerabilities
,
2006,
NDSS.