Web spam detection based on fitting distribution of features

Web spam disturbs users to obtain information normally and to detect spam pages effectively,distribution of web content features and linked features are analyzed.The result shows that normal web features distribute regular but spam web features distribute scattered.Based on the difference distribution,function to fit the distribution of normal web features is employed,and the difference between web proportion and the distribution function is calculated.Finally,C4.5 decision tree is constructed to detect spam pages with difference as threshold.The experimental results show that it can detect spam pages effectively.