Networks exist anywhere around us, say, the World Wide Web is a scale-free network whose vertices are Web pages and files, and edges are hyperlinks between Web pages and files. Employing the link architecture of World Wide Web network, search engines like Google help people locate resources efficiently. However, the performance of search engine is greatly decreased as search engine spam is involved. To handle the search engine spam problems, especially link farm spam, utilizing the degree distribution and average pathlength properties of Web network is one of the most novel breakthroughs in that normal Website is a scalefree network and the values of its properties are obviously different from those of properties of exceptional spam Website which is an instance of slightly-fully connected network. Through our thorough experiments, we find that these exceptional Websites are highly made up of spam pages, and our property-based approach has obvious efficacies on linkfarm detection, and in turn, enables search engines to provide more relevant results for users.
[1]
Brian D. Davison,et al.
Undue influence: eliminating the impact of link plagiarism on web search rankings
,
2006,
SAC.
[2]
L. da F. Costa,et al.
Characterization of complex networks: A survey of measurements
,
2005,
cond-mat/0505185.
[3]
Hector Garcia-Molina,et al.
Web Spam Taxonomy
,
2005,
AIRWeb.
[4]
V. Latora,et al.
Complex networks: Structure and dynamics
,
2006
.
[5]
Hector Garcia-Molina,et al.
Combating Web Spam with TrustRank
,
2004,
VLDB.
[6]
Albert,et al.
Emergence of scaling in random networks
,
1999,
Science.
[7]
Brian D. Davison,et al.
Identifying link farm spam pages
,
2005,
WWW '05.
[8]
Lada A. Adamic,et al.
Power-Law Distribution of the World Wide Web
,
2000,
Science.