A unified expanding method for content-ignorant web page clustering

The content-ignorant clustering method takes advantages in time complexity and space complexity.. than the content based methods. In this paper, the authors introduce a unified expanding method for content-ignorant Web page clustering by mining the ldquoclickthroughrdquo log, which tries to solve the problem that the ldquoclickthroughrdquo log is sparse. The relationship between two nodes which have been expanded is also defined and optimized. Analysis and experiment show that the performance of the new method has improved, by the comparison with the standard content-ignorant method. The new method can also work without iterative clustering.

[1]  Dik Lun Lee,et al.  Clustering search engine query log containing noisy clickthroughs , 2004, 2004 International Symposium on Applications and the Internet. Proceedings..

[2]  Filip Radlinski,et al.  How does clickthrough data reflect retrieval quality? , 2008, CIKM '08.

[3]  Wei-Ying Ma,et al.  Probabilistic query expansion using query logs , 2002, WWW '02.

[4]  Nen-Fu Huang,et al.  A fast URL lookup engine for content-aware multi-gigabit switches , 2005, 19th International Conference on Advanced Information Networking and Applications (AINA'05) Volume 1 (AINA papers).

[5]  Sebastian Thrun,et al.  Text Classification from Labeled and Unlabeled Documents using EM , 2000, Machine Learning.

[6]  Doug Beeferman,et al.  Agglomerative clustering of a search engine query log , 2000, KDD '00.

[7]  Yunhao Liu,et al.  Efficient multi-keyword search over p2p web , 2008, WWW.

[8]  Hua-Jun Zeng,et al.  Applying Associative Relationship on the Clickthrough Data to Improve Web Search , 2005, ECIR.