Web Spam Identification with User Browsing Graph

Combating Web spam has become one of the top challenges for Web search engines. Most previous researches in link-based Web spam identification focus on exploiting hyperlink graphs and corresponding user-behavior models. However, the fact that hyperlinks can be easily added and removed by Web spammers makes hyperlink graph unreliable. We construct a user browsing graph based on users' Web access log and adopt link analysis algorithms on this graph to identify Web spam pages. The constructed graph is much smaller than the original Web Graph, and link analysis algorithms can perform efficiently on them. Comparative experimental results also show that algorithms performed on the constructed graph outperforms those on the original graph.