Distance Measures in Query Space: How Strongly to Use Feedback From Past Queries

Feedback on past queries is a valuable resource for improving retrieval performance on new queries. We introduce a modular approach to incorporating feedback information into given retrieval architectures. We propose to fusion the original ranking with those returned by rerankers, each of which trained on feedback given for a distinct, single query. Here, we examine the basic case of improving a query's original ranking qtest by only using one reranker: the one trained on feedback on the "closest" query qtrain. We examine the use of various distance measures between queries to first identify qtrain and then determine the best linear combination of the original and the reranker's ratings, that is: to find out which feedback to learn from, and how strongly to use it. We show the cosine distance between the term vectors of the two queries, each enriched by representations of the top N originally returned documents, to reliably answer both questions. The fusion performs equally well or better than a) always using only the original ranker or the reranker, b) selecting a hard distance threshold to decide between the two, or c) fusioning results with a ratio that is globally optimized, but fixed across all tested queries.

[1]  Filip Radlinski,et al.  Evaluating the Robustness of Learning from Implicit Feedback , 2006, ArXiv.

[2]  Cyril Cleverdon,et al.  The Cranfield tests on index language devices , 1997 .

[3]  Klaus Obermayer,et al.  Support vector learning for ordinal regression , 1999 .

[4]  Danushka Bollegala,et al.  Measuring semantic similarity between words using web search engines , 2007, WWW '07.

[5]  Ling Lin,et al.  Using Structured Tokens to Identify Webpages for Data Extraction , 2007, APWeb/WAIM.

[6]  Filip Radlinski,et al.  Evaluating the accuracy of implicit feedback from clicks and query reformulations in Web search , 2007, TOIS.

[7]  Vijay V. Raghavan,et al.  On the reuse of past optimal queries , 1995, SIGIR '95.

[8]  David W. Embley,et al.  Record-boundary discovery in Web documents , 1999, SIGMOD '99.

[9]  Ping Zhong,et al.  A Generalized Hidden Markov Model Approach for Web Information Extraction , 2006, 2006 IEEE/WIC/ACM International Conference on Web Intelligence (WI 2006 Main Conference Proceedings)(WI'06).

[10]  Wei-Ying Ma,et al.  Simultaneous record detection and attribute labeling in web data extraction , 2006, KDD '06.

[11]  Thorsten Joachims,et al.  Optimizing search engines using clickthrough data , 2002, KDD.

[12]  Tie-Yan Liu,et al.  Adapting ranking SVM to document retrieval , 2006, SIGIR.

[13]  Gabriel Valiente,et al.  An efficient bottom-up distance between trees , 2001, Proceedings Eighth Symposium on String Processing and Information Retrieval.

[14]  Koby Crammer,et al.  Pranking with Ranking , 2001, NIPS.

[15]  Donna K. Harman,et al.  Overview of the Third Text REtrieval Conference (TREC-3) , 1995, TREC.

[16]  Nicholas J. Belkin,et al.  The effect multiple query representations on information retrieval system performance , 1993, SIGIR.

[17]  Berthier A. Ribeiro-Neto,et al.  A brief survey of web data extraction tools , 2002, SGMD.

[18]  W. Bruce Croft,et al.  Query performance prediction in web search environments , 2007, SIGIR.

[19]  Berthier A. Ribeiro-Neto,et al.  Extracting semi-structured data through examples , 1999, CIKM '99.

[20]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[21]  Abdur Chowdhury,et al.  A picture of search , 2006, InfoScale '06.

[22]  Garrison W. Cottrell,et al.  Fusion Via a Linear Combination of Scores , 1999, Information Retrieval.

[23]  Paul M. B. Vitányi,et al.  The Google Similarity Distance , 2004, IEEE Transactions on Knowledge and Data Engineering.

[24]  Edward A. Fox,et al.  Combination of Multiple Searches , 1993, TREC.

[25]  D. K. Harmon,et al.  Overview of the Third Text Retrieval Conference (TREC-3) , 1996 .

[26]  Thorsten Joachims,et al.  Accurately interpreting clickthrough data as implicit feedback , 2005, SIGIR '05.

[27]  James Allan,et al.  Automatic Query Expansion Using SMART: TREC 3 , 1994, TREC.

[28]  Susan T. Dumais,et al.  Improving Web Search Ranking by Incorporating User Behavior Information , 2019, SIGIR Forum.

[29]  Mehran Sahami,et al.  A web-based kernel function for measuring the similarity of short text snippets , 2006, WWW '06.

[30]  Markus Junker,et al.  Query Expansion for Web Information Retrieval , 2002, GI Jahrestagung.

[31]  Chi-Hoon Lee,et al.  Using query-specific variance estimates to combine Bayesian classifiers , 2006, ICML '06.

[32]  Bing Liu,et al.  Web data extraction based on partial tree alignment , 2005, WWW '05.