Using search engine to retrieve information has become an important part in people's daily life. For most search engines, click information is a significant factor in document ranking. As a result, some websites use cheating methods to get a higher rank by increasing clicks on its page fraudulently in search results to earn huge commercial interest, which is called “Click Spam”. Based on the analysis of features of cheating clicks, a novel automatic click spam detection approach is proposed in this paper, which consists of: 1) detect the single click record spam, by which 0.54% of all the clicks are detected as spams; 2) model user sessions with a triple sequence which by the first time, to be best of our knowledge, takes not only user action, but also action object and time interval between actions into consideration in related research; 3) based on the detected single click record spam and other features, find seed cheating session modes; and then use bipartite graph iterative algorithm to get higher precision and recall of click spam detection. Experiments have been made on Chinese commercial search engine real log data, containing around 80 million user clicks per day. As a result, 2.1% of all the clicks can be detected as spams, and the precision reaches to 97%. The proposed framework is with the high capability to detect click spam precisely and efficiently, which can be easily implemented in real world commercial search engine service.
[1]
Brian Rexroad,et al.
Wide-Scale Botnet Detection and Characterization
,
2007,
HotBots.
[2]
Erik Johnson,et al.
Is a bot at the controls?: Detecting input data attacks
,
2007,
NetGames '07.
[3]
Eric Brill,et al.
Improving web search ranking by incorporating user behavior information
,
2006,
SIGIR.
[4]
Hongwen Kang,et al.
Large-scale bot detection for search engines
,
2010,
WWW '10.
[5]
Jie Li,et al.
Characterizing typical and atypical user sessions in clickstreams
,
2008,
WWW.
[6]
Yi Zhu,et al.
Click Fraud
,
2009,
Mark. Sci..
[7]
Divyakant Agrawal,et al.
Using Association Rules for Fraud Detection in Web Advertising Networks
,
2005,
VLDB.
[8]
Filip Radlinski,et al.
Addressing Malicious Noise in Clickthrough Data
,
2007
.
[9]
Luca Becchetti,et al.
Link-Based Characterization and Detection of Web Spam
,
2006,
AIRWeb.