论文信息 - Using evidence based content trust model for spam detection

Using evidence based content trust model for spam detection

Content trust is one of the main components in the research of information retrieval. As it gets easier to add information to the Web via HTML pages, wikis, blogs, and other documents, it gets tougher to distinguish accurate or trustworthy information from inaccurate or untrustworthy information on the Web. Current technology of spam detection is based on binary metric, that is binary classification is adapted in the spam detection. In order to meet the users' need and preference, more accurate metric is needed in the content trust as well as in detecting spam information. In this paper, we use the notion of content trust for spam detection, and regard it as a ranking problem. Besides traditional text feature attributes, information quality based evidence is introduced to define the trust feature of spam information, and a novel content trust learning algorithm based on these evidence is proposed. Finally, a Web spam detection system is developed and the experiments on the real Web data are carried out, which show the proposed method performs very well in practice.

Guosun Zeng | Wei Wang | Daizhong Tang

[1] Marc Najork,et al. Detecting spam web pages through content analysis , 2006, WWW '06.

[2] Alexander Pretschner,et al. Ontology-based web site mapping for information exploration , 1999, CIKM '99.

[3] Hector Garcia-Molina,et al. Web Spam Taxonomy , 2005, AIRWeb.

[4] Yolanda Gil,et al. Towards content trust of web resources , 2006, WWW '06.

[5] Rajeev Motwani,et al. The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[6] Quan Zhang,et al. EviRank: An Evidence Based Content Trust Model for Web Spam Detection , 2007, APWeb/WAIM Workshops.

[7] Ramanathan V. Guha,et al. Propagation of trust and distrust , 2004, WWW '04.

[8] Wang Wei,et al. Trusted dynamic level scheduling based on Bayes trust model , 2007, Science in China Series F: Information Sciences.

[9] Hector Garcia-Molina,et al. Combating Web Spam with TrustRank , 2004, VLDB.

[10] Hector Garcia-Molina,et al. Link spam detection based on mass estimation , 2006, VLDB.

[11] Brian D. Davison,et al. Topical TrustRank: using topicality to combat web spam , 2006, WWW '06.