Learning Multiple Intent Representations for Search Queries

Representation learning has always played an important role in information retrieval (IR) systems. Most retrieval models, including recent neural approaches, use representations to calculate similarities between queries and documents to find relevant information from a corpus. Recent models use large-scale pre-trained language models for query representation. The typical use of these models, however, has a major limitation in that they generate only a single representation for a query, which may have multiple intents or facets. The focus of this paper is to address this limitation by considering neural models that support multiple intent representations for each query. Specifically, we propose the NMIR (Neural Multiple Intent Representations) model that can generate semantically different query intents and their appropriate representations. We evaluate our model on query facet generation using a large-scale dataset of real user queries sampled from the Bing search logs. We also provide an extrinsic evaluation of the proposed model using a clarifying question selection task. The results show that NMIR significantly outperforms competitive baselines.

[1]  Dae Hoon Park,et al.  A Neural Language Model for Query Auto-Completion , 2017, SIGIR.

[2]  W. Bruce Croft,et al.  Using Probabilistic Models of Document Retrieval without Relevance Information , 1979, J. Documentation.

[3]  Sreenivas Gollapudi,et al.  Diversifying search results , 2009, WSDM '09.

[4]  W. Bruce Croft,et al.  Inferring query aspects from reformulations using clustering , 2011, CIKM '11.

[5]  Andrew McCallum,et al.  Efficient Non-parametric Estimation of Multiple Embeddings per Word in Vector Space , 2014, EMNLP.

[6]  Ashwin K. Vijayakumar,et al.  Diverse Beam Search for Improved Description of Complex Scenes , 2018, AAAI.

[7]  Bhaskar Mitra,et al.  Analyzing and Learning from User Interactions for Search Clarification , 2020, SIGIR.

[8]  R'emi Louf,et al.  HuggingFace's Transformers: State-of-the-art Natural Language Processing , 2019, ArXiv.

[9]  Richard A. Harshman,et al.  Indexing by Latent Semantic Analysis , 1990, J. Am. Soc. Inf. Sci..

[10]  James Allan,et al.  Extending Faceted Search to the General Web , 2014, CIKM.

[11]  Ellen M. Vdorhees,et al.  The cluster hypothesis revisited , 1985, SIGIR '85.

[12]  Kilian Q. Weinberger,et al.  BERTScore: Evaluating Text Generation with BERT , 2019, ICLR.

[13]  William W. Cohen,et al.  Beyond independent relevance: methods and evaluation metrics for subtopic retrieval , 2003, SIGIR.

[14]  Panagiotis G. Ipeirotis,et al.  Automatic Extraction of Useful Facet Hierarchies from Text Databases , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[15]  Jimmy J. Lin,et al.  Pretrained Transformers for Text Ranking: BERT and Beyond , 2020, NAACL.

[16]  Gautam Das,et al.  Facetedpedia: dynamic generation of query-dependent faceted interfaces for wikipedia , 2010, WWW '10.

[17]  W. Bruce Croft,et al.  A Language Modeling Approach to Information Retrieval , 1998, SIGIR Forum.

[18]  Ben Carterette,et al.  Probabilistic models of ranking novel documents for faceted topic retrieval , 2009, CIKM.

[19]  Charles L. A. Clarke,et al.  Efficient and effective spam filtering and re-ranking for large web datasets , 2010, Information Retrieval.

[20]  Omer Levy,et al.  BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension , 2019, ACL.

[21]  Oren Kurland,et al.  Clusters, language models, and ad hoc information retrieval , 2009, TOIS.

[22]  J. J. Rocchio,et al.  Relevance feedback in information retrieval , 1971 .

[23]  Ying Li,et al.  KDD CUP-2005 report: facing a great challenge , 2005, SKDD.

[24]  W. Bruce Croft,et al.  A Deep Look into Neural Ranking Models for Information Retrieval , 2019, Inf. Process. Manag..

[25]  W. Bruce Croft,et al.  Open-Retrieval Conversational Question Answering , 2020, SIGIR.

[26]  Hugo Zaragoza,et al.  The Probabilistic Relevance Framework: BM25 and Beyond , 2009, Found. Trends Inf. Retr..

[27]  James Allan,et al.  Extracting query facets from search results , 2013, SIGIR.

[28]  W. Bruce Croft,et al.  Asking Clarifying Questions in Open-Domain Information-Seeking Conversations , 2019, SIGIR.

[29]  Mandar Mitra,et al.  Word Embedding based Generalized Language Model for Information Retrieval , 2015, SIGIR.

[30]  John D. Lafferty,et al.  Model-based feedback in the language modeling approach to information retrieval , 2001, CIKM '01.

[31]  Filip Radlinski,et al.  Improving personalized web search using result diversification , 2006, SIGIR.

[32]  Nick Craswell,et al.  MIMICS , 2020, Proceedings of the 29th ACM International Conference on Information & Knowledge Management.

[33]  Bo Long,et al.  Efficient Neural Query Auto Completion , 2020, CIKM.

[34]  M. Zaharia,et al.  ColBERT: Efficient and Effective Passage Search via Contextualized Late Interaction over BERT , 2020, SIGIR.

[35]  W. Bruce Croft,et al.  Modeling reformulation using query distributions , 2013, TOIS.

[36]  M. de Rijke,et al.  A Survey of Query Auto Completion in Information Retrieval , 2016, Found. Trends Inf. Retr..

[37]  W. Bruce Croft,et al.  An Evaluation of Techniques for Clustering Search Results , 2005 .

[38]  Ji-Rong Wen,et al.  Finding dimensions for queries , 2011, CIKM '11.

[39]  Lourdes Araujo,et al.  Standard Deviation as a Query Hardness Estimator , 2010, SPIRE.

[40]  Jiafeng Guo,et al.  IART: Intent-aware Response Ranking with Transformers in Information-seeking Conversation Systems , 2020, WWW.

[41]  Daniel Jurafsky,et al.  Do Multi-Sense Embeddings Improve Natural Language Understanding? , 2015, EMNLP.

[42]  Hamed Zamani,et al.  Current challenges and visions in music recommender systems research , 2017, International Journal of Multimedia Information Retrieval.

[43]  W. Bruce Croft,et al.  Relevance-Based Language Models , 2001, SIGIR '01.

[44]  Jade Goldstein-Stewart,et al.  The use of MMR, diversity-based reranking for reordering documents and producing summaries , 1998, SIGIR '98.

[45]  Jaana Kekäläinen,et al.  Cumulated gain-based evaluation of IR techniques , 2002, TOIS.

[46]  Hua Ouyang,et al.  Learning to Rewrite Queries , 2016, CIKM.

[47]  C. J. van Rijsbergen,et al.  The use of hierarchic clustering in information retrieval , 1971, Inf. Storage Retr..

[48]  Eric Horvitz,et al.  Patterns of search: analyzing and modeling Web query refinement , 1999 .

[49]  Oren Kurland,et al.  Query Expansion Using Word Embeddings , 2016, CIKM.

[50]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[51]  W. Bruce Croft,et al.  From Neural Re-Ranking to Neural Ranking: Learning a Sparse Representation for Inverted Indexing , 2018, CIKM.

[52]  W. Bruce Croft,et al.  Relevance-based Word Embedding , 2017, SIGIR.

[53]  Adam Meyerson,et al.  Online facility location , 2001, Proceedings 2001 IEEE International Conference on Cluster Computing.

[54]  Paul N. Bennett,et al.  Generating Clarifying Questions for Information Retrieval , 2020, WWW.

[55]  Paul N. Bennett,et al.  Generic Intent Representation in Web Search , 2019, SIGIR.

[56]  Kyunghyun Cho,et al.  Task-Oriented Query Reformulation with Reinforcement Learning , 2017, EMNLP.

[57]  W. Bruce Croft,et al.  Guided Transformer: Leveraging Multiple External Sources for Representation Learning in Conversational Search , 2020, SIGIR.

[58]  Ji-Rong Wen,et al.  Automatically Mining Facets for Queries from Their Search Results , 2016, IEEE Transactions on Knowledge and Data Engineering.

[59]  Marti A. Hearst,et al.  Automating Creation of Hierarchical Faceted Metadata Structures , 2007, NAACL.

[60]  Hamed Zamani,et al.  Situational Context for Ranking in Personal Search , 2017, WWW.

[61]  Susan T. Dumais,et al.  The vocabulary problem in human-system communication , 1987, CACM.

[62]  Hamed Zamani,et al.  MIMICS: A Large-Scale Data Collection for Search Clarification , 2020, CIKM.

[63]  W. Bruce Croft,et al.  Embedding-based Query Language Models , 2016, ICTIR.

[64]  James Allan,et al.  Precision-Oriented Query Facet Extraction , 2016, CIKM.

[65]  Susan T. Dumais,et al.  Challenges for Supporting Faceted Search in Large, Heterogeneous Corpora like the Web , 2008 .

[66]  Nick Craswell,et al.  Query Expansion with Locally-Trained Word Embeddings , 2016, ACL.

[67]  Craig MacDonald,et al.  Search Result Diversification , 2015, Found. Trends Inf. Retr..

[68]  W. Bruce Croft,et al.  Neural Ranking Models with Weak Supervision , 2017, SIGIR.

[69]  Enrique Alfonseca,et al.  Learning to Attend, Copy, and Generate for Session-Based Query Suggestion , 2017, CIKM.

[70]  Paul N. Bennett,et al.  Approximate Nearest Neighbor Negative Contrastive Learning for Dense Text Retrieval , 2020, ICLR.

[71]  Mark Sanderson,et al.  Ambiguous queries: test collections need more sense , 2008, SIGIR '08.

[72]  Nick Craswell,et al.  Learning to Match using Local and Distributed Representations of Text for Web Search , 2016, WWW.

[73]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[74]  M. Zaharia,et al.  ColBERT , 2020, Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval.

[75]  W. Bruce Croft,et al.  A Deep Relevance Matching Model for Ad-hoc Retrieval , 2016, CIKM.

[76]  Gerard Salton,et al.  A vector space model for automatic indexing , 1975, CACM.

[77]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[78]  Craig MacDonald,et al.  Exploiting query reformulations for web search result diversification , 2010, WWW '10.

[79]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[80]  W. Bruce Croft,et al.  Estimating Embedding Vectors for Queries , 2016, ICTIR.

[81]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[82]  W. Bruce Croft,et al.  Cluster-based retrieval using language models , 2004, SIGIR '04.

[83]  Larry P. Heck,et al.  Learning deep structured semantic models for web search using clickthrough data , 2013, CIKM.

[84]  K. Latha,et al.  AFGF: An Automatic Facet Generation Framework for Document Retrieval , 2010, 2010 International Conference on Advances in Computer Engineering.