论文信息 - Improvement of HITS-based algorithms on web documents - 字舞流文

Improvement of HITS-based algorithms on web documents

In this paper, we present two ways to improve the precision of HITS-based algorithms on Web documents. First, by analyzing the limitations of current HITS-based algorithms, we propose a new weighted HITS-based method that assigns appropriate weights to in-links of root documents. Then, we combine content analysis with HITS-based algorithms and study the effects of four representative relevance scoring methods, VSM, Okapi, TLS, and CDR, using a set of broad topic queries. Our experimental results show that our weighted HITS-based method performs significantly better than Bharat's improved HITS algorithm. When we combine our weighted HITS-based method or Bharat's HITS algorithm with any of the four relevance scoring methods, the combined methods are only marginally better than our weighted HITS-based method. Between the four relevance-scoring methods, there is no significant quality difference when they are combined with a HITS-based algorithm.

Wei Zhang | Longzhuang Li | Yi Shang | Wei Zhang | Yi Shang | Longzhuang Li

[1] Charles L. A. Clarke,et al. Shortest Substring Ranking (MultiText Experiments for TREC-4) , 1995, TREC.

[2] Ophir Frieder,et al. Integrating Structured Data and Text: A Relational Approach , 1997, J. Am. Soc. Inf. Sci..

[3] Charles L. A. Clarke,et al. Relevance ranking for one to three term queries , 1997, Inf. Process. Manag..

[4] Jonathan Gratch,et al. On the Efficient Allocation of Resources for Hypothesis Evaluation: A Statistical Approach , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[5] Amanda Spink,et al. Real life information retrieval: a study of user queries on the Web , 1998, SIGF.

[6] Krishna Bharat,et al. Improved algorithms for topic distillation in a hyperlinked environment , 1998, SIGIR '98.

[7] David A. Cohn,et al. Creating customized authority lists , 1999, ICML 1999.

[8] Steven L. MacCall,et al. A Relevance-based Quantitative Measure for Internet Information Retrieval Evaluation , 1999 .

[9] Shlomo Moran,et al. The stochastic approach for link-structure analysis (SALSA) and the TKC effect , 2000, Comput. Networks.

[10] Longzhuang Li,et al. A new statistical method for performance evaluation of search engines , 2000, Proceedings 12th IEEE Internationals Conference on Tools with Artificial Intelligence. ICTAI 2000.

[11] Gary Marchionini,et al. A Comparative Study of Web Search Service Performance , 1996 .

[12] Luis Gravano,et al. GlOSS: text-source discovery over the Internet , 1999, TODS.

[13] Ron Sacks-Davis,et al. Similarity Measures for Short Queries , 1995, TREC.

[14] Gerard Salton,et al. Document Length Normalization , 1995, Inf. Process. Manag..

[15] Andrew McCallum,et al. Learning to Create Customized Authority Lists , 2000, ICML.

[16] Jaideep Srivastava,et al. First 20 precision among World Wide Web search services (search engines) , 1999 .

[17] Longzhuang Li,et al. A new method for automatic performance comparison of search engines , 2004, World Wide Web.

[18] Jon M. Kleinberg,et al. Automatic Resource Compilation by Analyzing Hyperlink Structure and Associated Text , 1998, Comput. Networks.

[19] Steve A. Chien,et al. Efficient Heuristic Hypothesis Ranking , 1999, J. Artif. Intell. Res..

[20] Gerard Salton,et al. Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer , 1989 .

[21] Peter Bailey,et al. ACSys TREC-8 Experiments , 1999, TREC.

[22] David Hawking,et al. Overview of TREC-7 Very Large Collection Track , 1997, TREC.

[23] Peter Willett,et al. Estimating the recall performance of Web search engines , 1997 .

[24] Joel C. Miller,et al. Modifications of Kleinberg's HITS algorithm using matrix exponentiation and web log records , 2001, SIGIR '01.