论文信息 - Intention and Origination: An Inside Look at Large-Scale Bot Queries

Intention and Origination: An Inside Look at Large-Scale Bot Queries

Modern attackers increasingly exploit search engines as a vehicle to identify vulnerabilities and to gather information for launching new attacks. In this paper, we perform a large-scale quantitative analysis on bot queries received by the Bing search engine over month-long periods. Our analysis is based on an automated system, called SBotScope, that we develop to dissect large-scale bot queries. Specifically we answer questions of “what are the bot queries searching for?” and “who are submitting these queries?”. Our study shows that 33% of bot queries are searching for vulnerabilities, followed by 11% harvesting user account information. In one of our 16-day datasets, we uncover 8.2 million hosts from botnets and 13,364 hosts from data centers submitting bot queries. To the best of our knowledge, our work is the first large-scale effort toward systematically understanding bot query intentions and the scales of the malicious attacks associated with them.

Fang Yu | Yinglian Xie | Wenke Lee | Junjie Zhang | David Soukal

[1] John Platt,et al. Classification of Automated Web Traffic , 2009 .

[2] Farnam Jahanian,et al. The Zombie Roundup: Understanding, Detecting, and Disrupting Botnets , 2005, SRUTI.

[3] Qifa Ke,et al. SBotMiner: large scale search bot detection , 2010, WSDM '10.

[4] Xiaofei He,et al. Regularized query classification using search click information , 2008, Pattern Recognit..

[5] Niels Provos,et al. Search worms , 2006, WORM '06.

[6] Ophir Frieder,et al. Automatic classification of Web queries using very large unlabeled query logs , 2007, TOIS.

[7] Gregory Buehrer,et al. A large-scale study of automated web search traffic , 2008, AIRWeb '08.

[8] Enhong Chen,et al. Context-aware query classification , 2009, SIGIR.

[9] Ophir Frieder,et al. Improving automatic query classification via semi-supervised learning , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[10] Guofei Gu,et al. A Large-Scale Empirical Study of Conficker , 2012, IEEE Transactions on Information Forensics and Security.

[11] Michalis Vazirgiannis,et al. On Clustering Validation Techniques , 2001, Journal of Intelligent Information Systems.

[12] David J. C. MacKay,et al. Information Theory, Inference, and Learning Algorithms , 2004, IEEE Transactions on Information Theory.

[13] Ji-Rong Wen,et al. Query clustering using user logs , 2002, TOIS.

[14] Wenke Lee,et al. SURF: detecting and measuring search poisoning , 2011, CCS '11.

[15] Neil Daswani,et al. The Anatomy of Clickbot.A , 2007, HotBots.

[16] Martín Abadi,et al. deSEO: Combating Search-Result Poisoning , 2011, USENIX Security Symposium.

[17] Francesco Bonchi,et al. Do you want to take notes?: identifying research missions in Yahoo! search pad , 2010, WWW '10.

[18] Hongwen Kang,et al. Large-scale bot detection for search engines , 2010, WWW '10.

[19] Martín Abadi,et al. Heat-seeking honeypots: design and experience , 2011, WWW.

[20] Martín Abadi,et al. Searching the Searchers with SearchAudit , 2010, USENIX Security Symposium.

[21] Geoff Hulten,et al. Spamming botnets: signatures and characteristics , 2008, SIGCOMM '08.

[22] Filippo Menczer,et al. Behavior-driven clustering of queries into topics , 2011, CIKM '11.

[23] Vern Paxson,et al. What's Clicking What? Techniques and Innovations of Today's Clickbots , 2011, DIMVA.

[24] Michael I. Jordan,et al. Learning Spectral Clustering, With Application To Speech Separation , 2006, J. Mach. Learn. Res..

[25] Doug Beeferman,et al. Agglomerative clustering of a search engine query log , 2000, KDD '00.