Intention and Origination: An Inside Look at Large-Scale Bot Queries

Modern attackers increasingly exploit search engines as a vehicle to identify vulnerabilities and to gather information for launching new attacks. In this paper, we perform a large-scale quantitative analysis on bot queries received by the Bing search engine over month-long periods. Our analysis is based on an automated system, called SBotScope, that we develop to dissect large-scale bot queries. Specifically we answer questions of “what are the bot queries searching for?” and “who are submitting these queries?”. Our study shows that 33% of bot queries are searching for vulnerabilities, followed by 11% harvesting user account information. In one of our 16-day datasets, we uncover 8.2 million hosts from botnets and 13,364 hosts from data centers submitting bot queries. To the best of our knowledge, our work is the first large-scale effort toward systematically understanding bot query intentions and the scales of the malicious attacks associated with them.

[1]  John Platt,et al.  Classification of Automated Web Traffic , 2009 .

[2]  Farnam Jahanian,et al.  The Zombie Roundup: Understanding, Detecting, and Disrupting Botnets , 2005, SRUTI.

[3]  Qifa Ke,et al.  SBotMiner: large scale search bot detection , 2010, WSDM '10.

[4]  Xiaofei He,et al.  Regularized query classification using search click information , 2008, Pattern Recognit..

[5]  Niels Provos,et al.  Search worms , 2006, WORM '06.

[6]  Ophir Frieder,et al.  Automatic classification of Web queries using very large unlabeled query logs , 2007, TOIS.

[7]  Gregory Buehrer,et al.  A large-scale study of automated web search traffic , 2008, AIRWeb '08.

[8]  Enhong Chen,et al.  Context-aware query classification , 2009, SIGIR.

[9]  Ophir Frieder,et al.  Improving automatic query classification via semi-supervised learning , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[10]  Guofei Gu,et al.  A Large-Scale Empirical Study of Conficker , 2012, IEEE Transactions on Information Forensics and Security.

[11]  Michalis Vazirgiannis,et al.  On Clustering Validation Techniques , 2001, Journal of Intelligent Information Systems.

[12]  David J. C. MacKay,et al.  Information Theory, Inference, and Learning Algorithms , 2004, IEEE Transactions on Information Theory.

[13]  Ji-Rong Wen,et al.  Query clustering using user logs , 2002, TOIS.

[14]  Wenke Lee,et al.  SURF: detecting and measuring search poisoning , 2011, CCS '11.

[15]  Neil Daswani,et al.  The Anatomy of Clickbot.A , 2007, HotBots.

[16]  Martín Abadi,et al.  deSEO: Combating Search-Result Poisoning , 2011, USENIX Security Symposium.

[17]  Francesco Bonchi,et al.  Do you want to take notes?: identifying research missions in Yahoo! search pad , 2010, WWW '10.

[18]  Hongwen Kang,et al.  Large-scale bot detection for search engines , 2010, WWW '10.

[19]  Martín Abadi,et al.  Heat-seeking honeypots: design and experience , 2011, WWW.

[20]  Martín Abadi,et al.  Searching the Searchers with SearchAudit , 2010, USENIX Security Symposium.

[21]  Geoff Hulten,et al.  Spamming botnets: signatures and characteristics , 2008, SIGCOMM '08.

[22]  Filippo Menczer,et al.  Behavior-driven clustering of queries into topics , 2011, CIKM '11.

[23]  Vern Paxson,et al.  What's Clicking What? Techniques and Innovations of Today's Clickbots , 2011, DIMVA.

[24]  Michael I. Jordan,et al.  Learning Spectral Clustering, With Application To Speech Separation , 2006, J. Mach. Learn. Res..

[25]  Doug Beeferman,et al.  Agglomerative clustering of a search engine query log , 2000, KDD '00.