SLEUTH: Single-pubLisher attack dEtection Using correlaTion Hunting

Several data management challenges arise in the context of Internet advertising networks, where Internet advertisers pay Internet publishers to display advertisements on their Web sites and drive traffic to the advertisers from surfers' clicks. Although advertisers can target appropriate market segments, the model allows dishonest publishers to defraud the advertisers by simulating fake traffic to their own sites to claim more revenue. This paper addresses the case of publishers launching fraud attacks from numerous machines, which is the most widespread scenario. The difficulty of uncovering these attacks is proportional to the number of machines and resources exploited by the fraudsters. In general, detecting this class of fraud entails solving a new data mining problem, which is finding correlations in multidimensional data. Since the dimensions have large cardinalities, the search space is huge, which has long allowed dishonest publishers to inflate their traffic, and deplete the advertisers' advertising budgets. We devise the approximate SLEUTH algorithms to solve the problem efficiently, and uncover single-publisher frauds. We demonstrate the effectiveness of SLEUTH both analytically and by reporting some of its results on the Fastclick network, where numerous fraudsters were discovered.

[1]  Divyakant Agrawal,et al.  Duplicate detection in click streams , 2005, WWW '05.

[2]  Divesh Srivastava,et al.  Finding Hierarchical Heavy Hitters in Data Streams , 2003, VLDB.

[3]  Moses Charikar,et al.  Finding frequent items in data streams , 2002, Theor. Comput. Sci..

[4]  Chinya V. Ravishankar,et al.  Addressing Click Fraud in Content Delivery Systems , 2007, IEEE INFOCOM 2007 - 26th IEEE International Conference on Computer Communications.

[5]  Benny Pinkas,et al.  On the Security of Pay-per-Click and Other Web Advertising Schemes , 1999, Comput. Networks.

[6]  Udo W. Pooch,et al.  A Survey of Indexing Techniques for Sparse Matrices , 1973, CSUR.

[7]  Markus Jakobsson,et al.  Secure and Lightweight Advertising on the Web , 1999, Comput. Networks.

[8]  Graham Cormode,et al.  What's hot and what's not: tracking most frequent items dynamically , 2003, PODS '03.

[9]  Erik D. Demaine,et al.  Frequency Estimation of Internet Packet Streams with Limited Space , 2002, ESA.

[10]  Aoying Zhou,et al.  Dynamically maintaining frequent items over a data stream , 2003, CIKM '03.

[11]  Moni Naor,et al.  Secure and Efficient Metering , 1998, EUROCRYPT.

[12]  Richard M. Karp,et al.  A simple algorithm for finding frequent elements in streams and bags , 2003, TODS.

[13]  Burton H. Bloom,et al.  Space/time trade-offs in hash coding with allowable errors , 1970, CACM.

[14]  Farnam Jahanian,et al.  The Zombie Roundup: Understanding, Detecting, and Disrupting Botnets , 2005, SRUTI.

[15]  Michael K. Reiter,et al.  Detecting Hit Shaving in Click-Through Payment Schemes , 1998, USENIX Workshop on Electronic Commerce.

[16]  Divyakant Agrawal,et al.  Efficient Computation of Frequent and Top-k Elements in Data Streams , 2005, ICDT.

[17]  Yi Zhu,et al.  Click Fraud , 2009, Mark. Sci..

[18]  Divyakant Agrawal,et al.  Using Association Rules for Fraud Detection in Web Advertising Networks , 2005, VLDB.

[19]  Alan J. Broder Data Mining, the Internet, and Privacy , 1999, WEBKDD.

[20]  Stelvio Cimato,et al.  SAWM: a tool for secure and authenticated web metering , 2002, SEKE '02.

[21]  Divyakant Agrawal,et al.  An integrated efficient solution for computing frequent and top-k elements in data streams , 2006, TODS.

[22]  Daniel V. Klein Defending Against the Wily Surfer-Web-based Attacks and Defenses , 1999, Workshop on Intrusion Detection and Network Monitoring.

[23]  Divyakant Agrawal,et al.  Detectives: detecting coalition hit inflation attacks in advertising networks streams , 2007, WWW '07.

[24]  Divyakant Agrawal,et al.  On Hit Inflation Techniques and Detection in Streams of Web Advertising Networks , 2007, 27th International Conference on Distributed Computing Systems (ICDCS '07).

[25]  Divesh Srivastava,et al.  Diamond in the rough: finding Hierarchical Heavy Hitters in multi-dimensional data , 2004, SIGMOD '04.

[26]  George Varghese,et al.  New directions in traffic measurement and accounting: Focusing on the elephants, ignoring the mice , 2003, TOCS.

[27]  Koen Vanhoof,et al.  Web Usage Mining on Proxy Servers: A Case Study , 2001 .