On Hit Inflation Techniques and Detection in Streams of Web Advertising Networks

Click fraud is jeopardizing the industry of Internet advertising. Internet advertising is crucial for the thriving of the entire Internet, since it allows producers to advertise their products, and hence contributes to the well being of e-commerce. Moreover, advertising supports the intellectual value of the Internet by covering the running expenses of the content publishers' sites. Some publishers are dishonest, and use automation to generate traffic to defraud the advertisers. Similarly, some advertisers automate clicks on the advertisements of their competitors to deplete their competitors ' advertising budgets. In this paper, we describe the advertising network model, and discuss the issue of fraud that is an integral problem in such setting. We propose using online algorithms on aggregate data to accurately and proactively detect automated traffic, preserve surfers' privacy, while not altering the industry model. We provide a complete classification of the hit inflation techniques; and devise stream analysis techniques that detect a variety of fraud attacks. We abstract detecting the fraud attacks of some classes as theoretical stream analysis problems that we bring to the data management research community as open problems. A framework is outlined for deploying the proposed detection algorithms on a generic architecture. We conclude by some successful preliminary findings of our attempt to detect fraud on a real network.

[1]  Stelvio Cimato,et al.  SAWM: a tool for secure and authenticated web metering , 2002, SEKE '02.

[2]  Akira Tanaka,et al.  The Worst-Case Time Complexity for Generating All Maximal Cliques , 2004, COCOON.

[3]  Farnam Jahanian,et al.  The Zombie Roundup: Understanding, Detecting, and Disrupting Botnets , 2005, SRUTI.

[4]  Andreas Terzis,et al.  A multifaceted approach to understanding the botnet phenomenon , 2006, IMC '06.

[5]  27th International Conference on Distributed Computing Systems Workshops (ICDCS 2007 Workshops), June 25-29, 2007, Toronto, Ontario, Canada , 2007, ICDCS Workshops.

[6]  Chris J. Mitchell,et al.  Enhancing the Security of Cookies , 2001, ICISC.

[7]  Divyakant Agrawal,et al.  Duplicate detection in click streams , 2005, WWW '05.

[8]  Daniel V. Klein Defending Against the Wily Surfer-Web-based Attacks and Defenses , 1999, Workshop on Intrusion Detection and Network Monitoring.

[9]  Benny Pinkas,et al.  On the Security of Pay-per-Click and Other Web Advertising Schemes , 1999, Comput. Networks.

[10]  Divyakant Agrawal,et al.  Detectives: detecting coalition hit inflation attacks in advertising networks streams , 2007, WWW '07.

[11]  Markus Jakobsson,et al.  Secure and Lightweight Advertising on the Web , 1999, Comput. Networks.

[12]  Alan J. Broder Data Mining, the Internet, and Privacy , 1999, WEBKDD.

[13]  Li Fan,et al.  Summary cache: a scalable wide-area web cache sharing protocol , 2000, TNET.

[14]  Geoffrey Zweig,et al.  Syntactic Clustering of the Web , 1997, Comput. Networks.

[15]  Philip S. Yu,et al.  The state of the art in locally distributed Web-server systems , 2002, CSUR.

[16]  Divyakant Agrawal,et al.  Using Association Rules for Fraud Detection in Web Advertising Networks , 2005, VLDB.

[17]  Moni Naor,et al.  Secure and Efficient Metering , 1998, EUROCRYPT.

[18]  Donna L. Hoffman,et al.  New metrics for new media: toward the development of Web measurement standards , 1997, World Wide Web J..

[19]  Alan M. Frieze,et al.  Min-wise independent permutations (extended abstract) , 1998, STOC '98.

[20]  Moses Charikar,et al.  Similarity estimation techniques from rounding algorithms , 2002, STOC '02.

[21]  Geoff Shaw Spyware: Spyware & Adware: the Risks facing Businesses , 2003 .

[22]  Michael K. Reiter,et al.  Detecting Hit Shaving in Click-Through Payment Schemes , 1998, USENIX Workshop on Electronic Commerce.

[23]  Burton H. Bloom,et al.  Space/time trade-offs in hash coding with allowable errors , 1970, CACM.