Addressing Malicious Noise in Clickthrough Data

Clickthrough logs are becoming an increasingly used source of training data for learning ranking functions. Due to the large impact that the position in search results has on commercial websites, malicious noise is bound to appear in search engine click logs. We present preliminary work in addressing this form of noise, that we term click-spam. We analyze click-spam from a utility standpoint, and investigate the idea of whether personalizing web search results by partitioning the user population can reduce or eliminate the financial incentives for potential spammers. We formalize click-spam and analyze the incentives for malicious agents, then investigate the model with some examples.