A Hierarchical Recurrent Encoder-Decoder for Generative Context-Aware Query Suggestion

Users may strive to formulate an adequate textual query for their information need. Search engines assist the users by presenting query suggestions. To preserve the original search intent, suggestions should be context-aware and account for the previous queries issued by the user. Achieving context awareness is challenging due to data sparsity. We present a novel hierarchical recurrent encoder-decoder architecture that makes possible to account for sequences of previous queries of arbitrary lengths. As a result, our suggestions are sensitive to the order of queries in the context while avoiding data sparsity. Additionally, our model can suggest for rare, or long-tail, queries. The produced suggestions are synthetic and are sampled one word at a time, using computationally cheap decoding techniques. This is in contrast to current synthetic suggestion models relying upon machine learning pipelines and hand-engineered feature sets. Results show that our model outperforms existing context-aware approaches in a next query prediction setting. In addition to query suggestion, our architecture is general enough to be used in a variety of other applications.

[1]  Fabrizio Silvestri,et al.  Context- and Content-aware Embeddings for Query Rewriting in Sponsored Search , 2015, SIGIR.

[2]  Razvan Pascanu,et al.  How to Construct Deep Recurrent Neural Networks , 2013, ICLR.

[3]  Lukás Burget,et al.  Recurrent neural network based language model , 2010, INTERSPEECH.

[4]  Yelong Shen,et al.  A Latent Semantic Model with Convolutional-Pooling Structure for Information Retrieval , 2014, CIKM.

[5]  Alex Graves,et al.  Generating Sequences With Recurrent Neural Networks , 2013, ArXiv.

[6]  Qi He,et al.  Web Query Recommendation via Sequential Query Prediction , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[7]  Ricardo A. Baeza-Yates,et al.  Query Recommendation Using Query Logs in Search Engines , 2004, EDBT Workshops.

[8]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[9]  Alejandro López-Ortiz,et al.  Orthogonal query recommendation , 2013, RecSys.

[10]  Aristides Gionis,et al.  Improving recommendation for long-tail queries via templates , 2011, WWW.

[11]  Ping Li,et al.  Asymmetric LSH (ALSH) for Sublinear Time Maximum Inner Product Search (MIPS) , 2014, NIPS.

[12]  Enhong Chen,et al.  Context-aware query suggestion by mining click-through and session data , 2008, KDD.

[13]  Yoshua Bengio,et al.  Hierarchical Recurrent Neural Networks for Long-Term Dependencies , 1995, NIPS.

[14]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[15]  Yoshua Bengio,et al.  A Neural Probabilistic Language Model , 2003, J. Mach. Learn. Res..

[16]  Razvan Pascanu,et al.  Theano: A CPU and GPU Math Compiler in Python , 2010, SciPy.

[17]  Amanda Spink,et al.  Defining a session on Web search engines: Research Articles , 2007 .

[18]  Milad Shokouhi,et al.  Learning to personalize query auto-completion , 2013, SIGIR.

[19]  Paul N. Bennett,et al.  Understanding Intrinsic Diversity in Web Search: Improving Whole-Session Relevance , 2014, TOIS.

[20]  Yoshua Bengio,et al.  Deep Learning of Representations: Looking Forward , 2013, SLSP.

[21]  Yoshua Bengio,et al.  Learning long-term dependencies with gradient descent is difficult , 1994, IEEE Trans. Neural Networks.

[22]  Razvan Pascanu,et al.  On the difficulty of training recurrent neural networks , 2012, ICML.

[23]  Aristides Gionis,et al.  The query-flow graph: model and applications , 2008, CIKM '08.

[24]  Umut Ozertem,et al.  Learning to suggest: a machine learning framework for ranking query suggestions , 2012, SIGIR '12.

[25]  Lu Wang,et al.  Clustering query refinements by user intent , 2010, WWW '10.

[26]  Umut Ozertem,et al.  Synthesizing high utility suggestions for rare web search queries , 2011, SIGIR '11.

[27]  Yoshua Bengio,et al.  Deep Learning of Representations , 2013, Handbook on Neural Information Processing.

[28]  Charles L. A. Clarke,et al.  Overview of the TREC 2011 Web Track , 2011, TREC.

[29]  Geoffrey E. Hinton,et al.  Learning representations by back-propagating errors , 1986, Nature.

[30]  Wei Chu,et al.  Deep Learning Powered In-Session Contextual Ranking using Clickthrough Data , 2016 .

[31]  Pu-Jen Cheng,et al.  Learning user reformulation behavior for query auto-completion , 2014, SIGIR.

[32]  Larry P. Heck,et al.  Learning deep structured semantic models for web search using clickthrough data , 2013, CIKM.

[33]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[34]  Ji-Rong Wen,et al.  Clustering user queries of a search engine , 2001, WWW '01.

[35]  Miles Osborne,et al.  Statistical Machine Translation , 2010, Encyclopedia of Machine Learning and Data Mining.

[36]  Razvan Pascanu,et al.  Theano: new features and speed improvements , 2012, ArXiv.

[37]  Amanda Spink,et al.  Defining a session on Web search engines , 2007, J. Assoc. Inf. Sci. Technol..

[38]  Efthimis N. Efthimiadis,et al.  Analyzing and evaluating query reformulation strategies in web search logs , 2009, CIKM.

[39]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[40]  Fabrizio Silvestri,et al.  Efficient query recommendations in the long tail via center-piece subgraphs , 2012, SIGIR '12.

[41]  Fabrizio Silvestri,et al.  Generating suggestions for queries in the long tail with an inverted index , 2012, Inf. Process. Manag..

[42]  Bhaskar Mitra,et al.  Exploring Session Context using Distributed Representations of Queries and Reformulations , 2015, SIGIR.

[43]  Yoshua Bengio,et al.  Modeling term dependencies with quantum language models for IR , 2013, SIGIR.

[44]  Craig MacDonald,et al.  Learning to rank query suggestions for adhoc and diversity search , 2012, Information Retrieval.

[45]  Qiang Wu,et al.  Adapting boosting for information retrieval measures , 2010, Information Retrieval.