An EM Algorithm for Click Fraud Detection

This paper is concerned with the problem of click fraud detection. We assume each visitor of a website carries a latent indicator, which labels him/her as a regular or malicious user. Information such as number of clicks, number of page views (PVs) and time difference between consecutive clicks are cooperated in our newly proposed statistical model. We allow those random variables to share the same distribution but with different parameters according to the visitor's type. An EM algorithm is then suggested to obtain the maximum likelihood estimator. As a result, click fraud detection can be implemented by estimating the posterior malicious probability of each visitor. Simulation studies are conducted to assess the finite sample performance. We also demonstrate the usefulness of the proposed method via an empirical analysis of a real life example on searchengine marketing.