Compact query term selection using topically related text

Many recent and highly effective retrieval models for long queries use query reformulation methods that jointly optimize term weights and term selection. These methods learn using word context and global context but typically fail to capture query context. In this paper, we present a novel term ranking algorithm, PhRank, that extends work on Markov chain frameworks for query expansion to select compact and focused terms from within a query itself. This focuses queries so that one to five terms in an unweighted model achieve better retrieval effectiveness than weighted term selection models that use up to 30 terms. PhRank terms are also typically compact and contain 1-2 words compared to competing models that use query subsets up to 7 words long. PhRank captures query context with an affinity graph constructed using word co-occurrence in pseudo-relevant documents. A random walk of the graph is used for term ranking in combination with discrimination weights. Empirical evaluation using newswire and web collections demonstrates that performance of reformulated queries is significantly improved for long queries and at least as good for short, keyword queries compared to highly competitive information retrieval (IR) models.

[1]  W. Bruce Croft,et al.  Modeling higher-order term dependencies in information retrieval using query hypergraphs , 2012, SIGIR '12.

[2]  Vitor R. Carvalho,et al.  Reducing long queries using query quality predictors , 2009, SIGIR.

[3]  Vincent Ng,et al.  Conundrums in Unsupervised Keyphrase Extraction: Making Sense of the State-of-the-Art , 2010, COLING.

[4]  W. Bruce Croft,et al.  Improving verbose queries using subset distribution , 2010, CIKM.

[5]  John D. Lafferty,et al.  Model-based feedback in the language modeling approach to information retrieval , 2001, CIKM '01.

[6]  W. Bruce Croft,et al.  A quasi-synchronous dependence model for information retrieval , 2011, CIKM '11.

[7]  Karen Spärck Jones A statistical interpretation of term specificity and its application in retrieval , 2021, J. Documentation.

[8]  Gerard Salton,et al.  The SMART Retrieval System—Experiments in Automatic Document Processing , 1971 .

[9]  W. Bruce Croft,et al.  Query term ranking based on dependency parsing of verbose queries , 2010, SIGIR '10.

[10]  Matthias Hagen,et al.  Query segmentation revisited , 2011, WWW.

[11]  J. J. Rocchio,et al.  Relevance feedback in information retrieval , 1971 .

[12]  Fabio Crestani,et al.  Application of Spreading Activation Techniques in Information Retrieval , 1997, Artificial Intelligence Review.

[13]  Xiaojun Wan,et al.  Single Document Keyphrase Extraction Using Neighborhood Knowledge , 2008, AAAI.

[14]  ChengXiang Zhai,et al.  A general optimization framework for smoothing language models on graph structures , 2008, SIGIR '08.

[15]  Niranjan Balasubramanian,et al.  Exploring reductions for long web queries , 2010, SIGIR.

[16]  W. Bruce Croft,et al.  Using Probabilistic Models of Document Retrieval without Relevance Information , 1979, J. Documentation.

[17]  James Allan,et al.  Recent Experiments with INQUERY , 1995, TREC.

[18]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[19]  Reinhard Köhler,et al.  Patterns in syntactic dependency networks. , 2004, Physical review. E, Statistical, nonlinear, and soft matter physics.

[20]  Ramon Ferrer i Cancho,et al.  The small world of human language , 2001, Proceedings of the Royal Society of London. Series B: Biological Sciences.

[21]  Iadh Ounis,et al.  A syntactically-based query reformulation technique for information retrieval , 2008, Inf. Process. Manag..

[22]  W. Bruce Croft,et al.  Evaluating verbose query processing techniques , 2010, SIGIR.

[23]  W. Bruce Croft,et al.  Learning concept importance using a weighted dependence model , 2010, WSDM '10.

[24]  Iadh Ounis,et al.  Global Statistics in Proximity Weighting Models , 2010 .

[25]  W. Bruce Croft,et al.  A Markov random field model for term dependencies , 2005, SIGIR '05.

[26]  W. Bruce Croft,et al.  Discovering key concepts in verbose queries , 2008, SIGIR '08.

[27]  James Allan,et al.  Regression Rank: Learning to Meet the Opportunity of Descriptive Queries , 2009, ECIR.

[28]  W. Bruce Croft,et al.  The use of phrases and structured queries in information retrieval , 1991, SIGIR '91.

[29]  Christopher D. Manning,et al.  Introduction to Information Retrieval , 2010, J. Assoc. Inf. Sci. Technol..

[30]  Kevyn Collins-Thompson,et al.  Query expansion using random walk models , 2005, CIKM '05.

[31]  Jian-Yun Nie,et al.  Query model refinement using word graphs , 2010, CIKM.

[32]  Rada Mihalcea,et al.  TextRank: Bringing Order into Text , 2004, EMNLP.