Learning from homologous queries and semantically related terms for query auto completion

We propose a learning to rank based query auto completion model (L2R-QAC) that exploits contributions from so-called homologous queries for a QAC candidate, in which two kinds of homologous queries are taken into account.We propose semantic features for QAC, using the semantic relatedness of terms inside a query candidate and of pairs of terms from a candidate and from queries previously submitted in the same session.We analyze the effectiveness of our L2R-QAC model with newly added features, and find that it significantly outperforms state-of-the-art QAC models, either based on learning to rank or on popularity. Query auto completion (QAC) models recommend possible queries to web search users when they start typing a query prefix. Most of today's QAC models rank candidate queries by popularity (i.e., frequency), and in doing so they tend to follow a strict query matching policy when counting the queries. That is, they ignore the contributions from so-called homologous queries, queries with the same terms but ordered differently or queries that expand the original query. Importantly, homologous queries often express a remarkably similar search intent. Moreover, today's QAC approaches often ignore semantically related terms. We argue that users are prone to combine semantically related terms when generating queries.We propose a learning to rank-based QAC approach, where, for the first time, features derived from homologous queries and semantically related terms are introduced. In particular, we consider: (i) the observed and predicted popularity of homologous queries for a query candidate; and (ii) the semantic relatedness of pairs of terms inside a query and pairs of queries inside a session. We quantify the improvement of the proposed new features using two large-scale real-world query logs and show that the mean reciprocal rank and the success rate can be improved by up to 9% over state-of-the-art QAC models.

[1]  Michael R. Lyu,et al.  Learning latent semantic relations from clickthrough data for query suggestion , 2008, CIKM '08.

[2]  Qiang Wu,et al.  Learning to Rank Using an Ensemble of Lambda-Gradient Models , 2010, Yahoo! Learning to Rank Challenge.

[3]  João Gama,et al.  A survey on concept drift adaptation , 2014, ACM Comput. Surv..

[4]  Enhong Chen,et al.  Mining Concept Sequences from Large-Scale Search Logs for Context-Aware Query Suggestion , 2011, TIST.

[5]  de RijkeMaarten,et al.  Learning from homologous queries and semantically related terms for query auto completion , 2016 .

[6]  Michael Gertz,et al.  CONQUER: a system for efficient context-aware query suggestions , 2011, WWW.

[7]  Umut Ozertem,et al.  Learning to suggest: a machine learning framework for ranking query suggestions , 2012, SIGIR '12.

[8]  Milad Shokouhi,et al.  Learning to personalize query auto-completion , 2013, SIGIR.

[9]  Joemon M. Jose,et al.  Recent and robust query auto-completion , 2014, WWW.

[10]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[11]  Kenneth Ward Church,et al.  Query suggestion using hitting time , 2008, CIKM '08.

[12]  Xueqi Cheng,et al.  Intent-aware query similarity , 2011, CIKM '11.

[13]  Craig MacDonald,et al.  Learning to rank query suggestions for adhoc and diversity search , 2012, Information Retrieval.

[14]  Nick Craswell,et al.  Proceedings of the 2009 workshop on Web Search Click Data, WSCD@WSDM 2009, Barcelona, Spain, February 9, 2009 , 2009, WSCD@WSDM.

[15]  M. de Rijke,et al.  Click Models for Web Search , 2015, Click Models for Web Search.

[16]  Milad Shokouhi,et al.  Time-sensitive query auto-completion , 2012, SIGIR '12.

[17]  Ziv Bar-Yossef,et al.  Context-sensitive query auto-completion , 2011, WWW.

[18]  Yang Liu,et al.  Adaptive query suggestion for difficult queries , 2012, SIGIR '12.

[19]  M. de Rijke,et al.  Time-sensitive Personalized Query Auto-Completion , 2014, CIKM.

[20]  Mike Thelwall,et al.  Synthesis Lectures on Information Concepts, Retrieval, and Services , 2009 .

[21]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[22]  Jonathan Weese,et al.  UMBC_EBIQUITY-CORE: Semantic Textual Similarity Systems , 2013, *SEMEVAL.

[23]  M. de Rijke,et al.  Personalized document re-ranking based on Bayesian probabilistic matrix factorization , 2014, SIGIR.

[24]  Yehuda Koren,et al.  Expediting search trend detection via prediction of query counts , 2013, WSDM.

[25]  Hongbo Deng,et al.  A two-dimensional click model for query auto-completion , 2014, SIGIR.

[26]  Gilad Mishne,et al.  Finding high-quality content in social media , 2008, WSDM '08.

[27]  Filip Radlinski,et al.  On user interactions with query auto-completion , 2014, SIGIR.

[28]  Enhong Chen,et al.  Context-aware query suggestion by mining click-through and session data , 2008, KDD.

[29]  Tie-Yan Liu,et al.  Learning to rank for information retrieval , 2009, SIGIR.

[30]  Susan T. Dumais,et al.  Understanding temporal query dynamics , 2011, WSDM '11.

[31]  Bhaskar Mitra,et al.  An Eye-tracking Study of User Interactions with Query Auto Completion , 2014, CIKM.

[32]  Huizhong Duan,et al.  Online spelling correction for query completion , 2011, WWW.

[33]  Abdur Chowdhury,et al.  A picture of search , 2006, InfoScale '06.

[34]  Steve Chien,et al.  Semantic similarity between search engine queries using temporal correlation , 2005, WWW '05.

[35]  P. A. Blight The Analysis of Time Series: An Introduction , 1991 .

[36]  Benjamin Rey,et al.  Generating query substitutions , 2006, WWW '06.

[37]  Pu-Jen Cheng,et al.  Learning user reformulation behavior for query auto-completion , 2014, SIGIR.