An Improved Weighted HITS Algorithm Based on Similarity andPopularity

The HITS algorithm is a very popular and effective algorithm to rank documents based on the link information among a set of documents. However, it assigns every link with the same weight which results in topic drift. In this paper, we generalize the similarity of web pages and propose a query-induced similarity describing how a webpage is similar to another on a query topic. Then, we provide a new improved weighted hits-based (I-HITS) algorithm by assigning appropriate weights to links with the similarity and popularity of web pages. Experiment results indicate that the improved HITS algorithm can find more relevant pages than HITS, ARC, SALSA and improve the relevance by 30%-50%. Furthermore, it can avoid the problem of topic drift and enhance the quality of web search effectively.

[1]  Jon M. Kleinberg,et al.  Mining the Web's Link Structure , 1999, Computer.

[2]  Wei-Ying Ma,et al.  Ranking user's relevance to a topic through link analysis on web logs , 2002, WIDM '02.

[3]  Declan Butler,et al.  Souped-up search engines , 2000, Nature.

[4]  Wenpu Xing,et al.  Weighted PageRank algorithm , 2004, Proceedings. Second Annual Conference on Communication Networks and Services Research, 2004..

[5]  Amanda Spink,et al.  Real life information retrieval: a study of user queries on the Web , 1998, SIGF.

[6]  Joel C. Miller,et al.  Modifications of Kleinberg's HITS algorithm using matrix exponentiation and web log records , 2001, SIGIR '01.

[7]  Stephen E. Robertson,et al.  Effective site finding using link anchor information , 2001, SIGIR '01.

[8]  WebChris Ding,et al.  Link Analysis : Hubs and Authorities on the WorldWide , 2001 .

[9]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[10]  Shlomo Moran,et al.  The stochastic approach for link-structure analysis (SALSA) and the TKC effect , 2000, Comput. Networks.

[11]  Zhang Min Anchor Text and Its Context Based Web Information Retrieval , 2004 .

[12]  Jon M. Kleinberg,et al.  Automatic Resource Compilation by Analyzing Hyperlink Structure and Associated Text , 1998, Comput. Networks.

[13]  Krishna Bharat,et al.  Improved algorithms for topic distillation in a hyperlinked environment , 1998, SIGIR '98.

[14]  David A. Cohn,et al.  Creating customized authority lists , 1999, ICML 1999.