论文信息 - Link based small sample learning for web spam detection

Link based small sample learning for web spam detection

Robust statistical learning based web spam detection system often requires large amounts of labeled training data. However, labeled samples are more difficult, expensive and time consuming to obtain than unlabeled ones. This paper proposed link based semi-supervised learning algorithms to boost the performance of a classifier, which integrates the traditional Self-training with the topological dependency based link learning. The experiments with a few labeled samples on standard WEBSPAM-UK2006 benchmark showed that the algorithms are effective.

Xinchang Zhang | Qiudan Li | Guanggang Geng

[1] Chunheng Wang,et al. Improving web spam detection with re-extracted features , 2008, WWW.

[2] Eugene Charniak,et al. Effective Self-Training for Parsing , 2006, NAACL.

[3] Fabrizio Silvestri,et al. Know your neighbors: web spam detection using the web topology , 2007, SIGIR.