Investigate the use of Anchor-Text and of Query-Document Similarity Scores to Predict the Performance of Search Engine

Query difficulty prediction aims to estimate, in advance, whether the answers returned by search engines in response to a query are likely to be useful. This paper proposes new predictors based upon the similarity between the query and answer documents, as calculated by the three different models. It examined the use of anchor text-based document surrogates, and how their similarity to queries can be used to estimate query difficulty. It evaluated the performance of the predictors based on 1) the correlation between the average precision (AP), 2) the precision at 10 (P@10) of the full text retrieved results, 3) a similarity score of anchor text, and 4) a similarity score of full-text, using the WT10g data collection of web data. Experimental evaluation of our research shows that five of our proposed predictors demonstrate reliable and consistent performance across a variety of different retrieval models.

[1]  Kevin S. McCurley,et al.  Analysis of anchor text for web search , 2003, SIGIR.

[2]  H Ali Effective web crawlers , 2008 .

[3]  W. Bruce Croft,et al.  Predicting query performance , 2002, SIGIR '02.

[4]  Elad Yom-Tov,et al.  Learning to estimate query difficulty: including applications to missing content detection and distributed information retrieval , 2005, SIGIR '05.

[5]  Hugh E. Williams,et al.  Query association surrogates for Web search , 2004, J. Assoc. Inf. Sci. Technol..

[6]  Iadh Ounis,et al.  Inferring Query Performance Using Pre-retrieval Predictors , 2004, SPIRE.

[7]  S. Ross :Scholarship in the Digital Age: Information, Infrastructure, and the Internet , 2009 .

[8]  Josiane Mothe,et al.  Linguistic features to predict query difficulty - a case study on previous TREC campaigns , 2005 .

[9]  Shai Fine,et al.  Metasearch and Federation using Query Difficulty Prediction , 2005 .

[10]  S. Fine,et al.  Improving document retrieval according to prediction of query difficulty , 2004 .

[11]  Eitan Farchi,et al.  Automatic query wefinement using lexical affinities with maximal information gain , 2002, SIGIR '02.

[12]  Rongkuo Zhao,et al.  Chiral metamaterials: retrieval of the effective parameters with and without substrate. , 2010, Optics express.

[13]  Craig Macdonald,et al.  Predicting Query Performance in Intranet Search ∗ , 2005 .

[14]  Hugh E. Williams,et al.  The Zettair Search Engine , 1998 .

[15]  Elad Yom-Tov,et al.  What makes a query difficult? , 2006, SIGIR.

[16]  John D. Lafferty,et al.  Two-stage language models for information retrieval , 2002, SIGIR '02.

[17]  Iadh Ounis,et al.  Query performance prediction , 2006, Inf. Syst..

[18]  C. Lee Giles,et al.  Accessibility of information on the Web , 2000, INTL.

[19]  Stephen E. Robertson,et al.  Okapi/Keenbow at TREC-8 , 1999, TREC.

[20]  Martin F. Porter,et al.  An algorithm for suffix stripping , 1997, Program.

[21]  Stephen E. Robertson,et al.  Effective site finding using link anchor information , 2001, SIGIR '01.

[22]  Elad Yom-Tov,et al.  Juru at TREC 2004: Experiments with Prediction of Query Difficulty , 2004, TREC.