Recommending Search Queries in Documents Using Inter N-Gram Similarities

Reading a document can often trigger a need for additional information. For example, a reader of a news article might be interested in information about the persons and events mentioned in the article. Accordingly, there is a line of work on recommending search-engine queries given a document read by a user. Often, the recommended queries are selected from a query log independently of each other, and are presented to the user without any context. We address a novel query recommendation task where the recommended queries must be n-grams (sequences of consecutive terms) in the document. Furthermore, inspired by work on using inter-document similarities for document retrieval, we explore the merits of using inter n-gram similarities for query recommendation. Specifically, we use a supervised approach to learn an inter n-gram similarity measure where the goal is that n-grams that are likely to serve as queries will be deemed more similar to each other than to other n-grams. We use the similarity measure in a wide variety of query recommendation approaches which we devise as adaptations of ad hoc document retrieval techniques. Empirical evaluation performed using data gathered from Yahoo!'s search engine logs attests to the effectiveness of the resultant recommendation methods.

[1]  Najafi Azadeh,et al.  REAL LIFE, REAL USERS AND REAL NEEDS: A STUDY AND ANALYSIS OF USER QUERIES ON THE WEB , 2008 .

[2]  James Allan,et al.  Predicting Search Intent Based on Pre-Search Context , 2015, SIGIR.

[3]  James Allan,et al.  INQUERY and TREC-8 , 1998, TREC.

[4]  Oren Kurland,et al.  Position-based contextualization for passage retrieval , 2013, CIKM.

[5]  Oren Kurland,et al.  Utilizing inter-passage and inter-document similarities for re-ranking search results , 2009, TOIS.

[6]  Berthier A. Ribeiro-Neto,et al.  Concept-based interactive query expansion , 2005, CIKM '05.

[7]  Oren Kurland,et al.  A passage-based approach to learning to rank documents , 2019, Information Retrieval Journal.

[8]  Oren Kurland,et al.  Selective Cluster Presentation on the Search Results Page , 2018, ACM Trans. Inf. Syst..

[9]  W. Bruce Croft,et al.  Generating queries from user-selected text , 2012, IIiX.

[10]  Craig MacDonald,et al.  On the usefulness of query features for learning to rank , 2012, CIKM.

[11]  Rada Mihalcea,et al.  TextRank: Bringing Order into Text , 2004, EMNLP.

[12]  Francesco Bonchi,et al.  Query suggestions using query-flow graphs , 2009, WSCD '09.

[13]  Ellen M. Vdorhees,et al.  The cluster hypothesis revisited , 1985, SIGIR '85.

[14]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[15]  Ricardo Baeza-Yates,et al.  Improved query difficulty prediction for the web , 2008, CIKM '08.

[16]  W. Bruce Croft,et al.  Query reformulation using anchor text , 2010, WSDM '10.

[17]  Oren Kurland,et al.  Testing the Cluster Hypothesis with Focused and Graded Relevance Judgments , 2018, SIGIR.

[18]  Oren Kurland,et al.  Ranking document clusters using markov random fields , 2013, SIGIR.

[19]  Charles L. A. Clarke,et al.  Reciprocal rank fusion outperforms condorcet and individual rank learning methods , 2009, SIGIR.

[20]  Jiawei Han,et al.  Entity Linking with a Knowledge Base: Issues, Techniques, and Solutions , 2015, IEEE Transactions on Knowledge and Data Engineering.

[21]  C. J. van Rijsbergen,et al.  The use of hierarchic clustering in information retrieval , 1971, Inf. Storage Retr..

[22]  Thorsten Joachims,et al.  Training linear SVMs in linear time , 2006, KDD '06.

[23]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[24]  D. Cheriton From doc2query to docTTTTTquery , 2019 .

[25]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[26]  Evgeniy Gabrilovich,et al.  Computing Semantic Relatedness Using Wikipedia-based Explicit Semantic Analysis , 2007, IJCAI.

[27]  Tie-Yan Liu,et al.  Learning to rank for information retrieval , 2009, SIGIR.

[28]  Falk Scholer,et al.  Effective Pre-retrieval Query Performance Prediction Using Similarity and Variability Evidence , 2008, ECIR.

[29]  Milad Shokouhi,et al.  Learning Asymmetric Co-Relevance , 2015, ICTIR.

[30]  W. Bruce Croft,et al.  Evaluating Text Representations for Retrieval of the Best Group of Documents , 2008, ECIR.

[31]  Prasenjit Mitra,et al.  Query suggestions in the absence of query logs , 2011, SIGIR.

[32]  Mihai Surdeanu,et al.  The Stanford CoreNLP Natural Language Processing Toolkit , 2014, ACL.

[33]  Oren Kurland,et al.  Re-ranking search results using language models of query-specific clusters , 2009, Information Retrieval.

[34]  Key-Sun Choi,et al.  Re-ranking model based on document clusters , 2001, Inf. Process. Manag..

[35]  Oren Kurland,et al.  The cluster hypothesis for entity oriented search , 2013, SIGIR.

[36]  Amanda Spink,et al.  Real life, real users, and real needs: a study and analysis of user queries on the web , 2000, Inf. Process. Manag..

[37]  Aristides Gionis,et al.  Improving recommendation for long-tail queries via templates , 2011, WWW.

[38]  Tie-Yan Liu,et al.  Actively predicting diverse search intent from user browsing behaviors , 2010, WWW '10.

[39]  Carmel Domshlak,et al.  A rank-aggregation approach to searching for optimal query-specific clusters , 2008, SIGIR '08.

[40]  Idan Szpektor,et al.  Novelty based Ranking of Human Answers for Community Questions , 2016, SIGIR.

[41]  Silviu Cucerzan,et al.  Predicting when browsing context is relevant to search , 2008, SIGIR '08.

[42]  Oren Kurland,et al.  The correlation between cluster hypothesis tests and the effectiveness of cluster-based retrieval , 2014, SIGIR.

[43]  Jimmy J. Lin,et al.  Document Expansion by Query Prediction , 2019, ArXiv.

[44]  Ricardo A. Baeza-Yates,et al.  Query Recommendation Using Query Logs in Search Engines , 2004, EDBT Workshops.

[45]  W. Bruce Croft,et al.  Cluster-based retrieval using language models , 2004, SIGIR '04.

[46]  Qiang Wu,et al.  Adapting boosting for information retrieval measures , 2010, Information Retrieval.

[47]  Tong Wang,et al.  Neural Models for Key Phrase Extraction and Question Generation , 2017, QA@ACL.

[48]  Francesco Bonchi,et al.  From machu_picchu to "rafting the urubamba river": anticipating information needs via the entity-query graph , 2013, WSDM '13.

[49]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[50]  David Carmel,et al.  Enriching News Articles with Related Search Queries , 2019, WWW.