论文信息 - Detecting Comment Spam through Content Analysis

Detecting Comment Spam through Content Analysis

In theWeb 2.0 eras, the individual Internet users can also act as information providers, releasing information or making comments conveniently. However, some participants may spread irresponsible remarks or express irrelevant comments for commercial interests. This kind of so-called comment spam severely hurts the information quality. This paper tries to automatically detect comment spam through content analysis, using some previously-undescribed features. Experiments on a real data set show that our combined heuristics can correctly identify comment spam with high precision(90.4%) and recall(84.5%).

Yan Zhang | Congrui Huang | Qiancheng Jiang

[1] Gilad Mishne,et al. Blocking Blog Spam with Language Model Disagreement , 2005, AIRWeb.

[2] Paolo Massa,et al. Page-reRank: using trusted links to re-rank authority , 2005, The 2005 IEEE/WIC/ACM International Conference on Web Intelligence (WI'05).

[3] Gilad Mishne,et al. Leave a Reply: An Analysis of Weblog Comments , 2006 .

[4] Adam Thomason. Blog Spam: A Review , 2007, CEAS.

[5] Charlie Lindahl,et al. Weblogs: Simplifying Web Publishing , 2003, Computer.

[6] Gordon V. Cormack,et al. Spam filtering for short messages , 2007, CIKM '07.

[7] Marc Najork,et al. Detecting spam web pages through content analysis , 2006, WWW '06.

[8] S. R. Hiltz. The Network Nation , 1978 .

[9] D. Sculley,et al. Relaxed online SVMs for spam filtering , 2007, SIGIR.

[10] Ravi Jayagopal. No Business Like E-Business: The Spectacularly Simple Secrets Behind How You Can Create a Web Site and Make Money with It , 2007 .

[11] Brian D. Davison,et al. Detection of Harassment on Web 2.0 , 2009 .

[12] Georgia Koutrika,et al. Fighting Spam on Social Web Sites: A Survey of Approaches and Future Challenges , 2007, IEEE Internet Computing.