A Survey of Query Auto Completion in Information Retrieval

In information retrieval, query auto completion (QAC), also known as type-ahead and auto-complete suggestion, refers to the following functionality: given a prex consisting of a number of characters entered into a search box, the user interface proposes alternative ways of extending the prex to a full query. QAC helps users to formulate their query when they have an intent in mind but not a clear way of expressing this in a query. It helps to avoid possible spelling mistakes, especially on devices with small screens. It saves keystrokes and cuts down the search duration of users which implies a lower load on the search engine, and results in savings in machine resources and maintenance. Because of the clear benets of QAC, a considerable number of algorithmic approaches to QAC have been proposed in the past few years. Query logs have proven to be a key asset underlying most of the recent research. This monograph surveys this research. It focuses on summarizing the literature on QAC and provides a general understanding of the wealth of QAC approaches that are currently available. A Survey of Query Auto Completion in Information Retrieval is an ideal reference on the topic. Its contributions can be summarized as follows: It provides researchers who are working on query auto completion or related problems in the eld of information retrieval with a good overview and analysis of state-of-the-art QAC approaches. In particular, for researchers new to the eld, the survey can serve as an introduction to the state-of-the-art. It also offers a comprehensive perspective on QAC approaches by presenting a taxonomy of existing solutions. In addition, it presents solutions for QAC under different conditions such as available high-resolution query logs, in-depth user interactions with QAC using eye-tracking, and elaborate user engagements in a QAC process. It also discusses practical issues related to QAC. Lastly, it presents a detailed discussion of core challenges and promising open directions in QAC.

[1]  Kenneth Ward Church,et al.  Query suggestion using hitting time , 2008, CIKM '08.

[2]  Xueqi Cheng,et al.  Intent-aware query similarity , 2011, CIKM '11.

[3]  Nuria Oliver,et al.  Understanding mobile web and mobile search use in today's dynamic mobile landscape , 2011, Mobile HCI.

[4]  Hongbo Deng,et al.  Analyzing User's Sequential Behavior in Query Auto-Completion via Markov Processes , 2015, SIGIR.

[5]  Craig MacDonald,et al.  Learning to rank query suggestions for adhoc and diversity search , 2012, Information Retrieval.

[6]  Joemon M. Jose,et al.  A Comparative Study of the Effectiveness of Search Result Presentation on the Web , 2006, ECIR.

[7]  Tetsuya Sakai,et al.  New assessment criteria for query suggestion , 2012, SIGIR '12.

[8]  Charles L. A. Clarke,et al.  Novelty and diversity in information retrieval evaluation , 2008, SIGIR '08.

[9]  Thorsten Joachims,et al.  Eye-tracking analysis of user behavior in WWW search , 2004, SIGIR '04.

[10]  ChengXiang Zhai,et al.  Mining long-term search history to improve search accuracy , 2006, KDD '06.

[11]  Xu Sun,et al.  Fast multi-task learning for query spelling correction , 2012, CIKM '12.

[12]  Qiang Wu,et al.  Learning to Rank Using an Ensemble of Lambda-Gradient Models , 2010, Yahoo! Learning to Rank Challenge.

[13]  Jianfeng Gao,et al.  A Unified Approach to Transliteration-based Text Input with Online Spelling Correction , 2012, EMNLP.

[14]  Fabian M. Suchanek,et al.  ESTER: efficient search on text, entities, and relations , 2007, SIGIR.

[15]  H. V. Jagadish,et al.  Effective Phrase Prediction , 2007, VLDB.

[16]  Ingmar Weber,et al.  Type less, find more: fast autocompletion search with a succinct index , 2006, SIGIR.

[17]  Mike Thelwall,et al.  Synthesis Lectures on Information Concepts, Retrieval, and Services , 2009 .

[18]  C. Holt Author's retrospective on ‘Forecasting seasonals and trends by exponentially weighted moving averages’ , 2004 .

[19]  Gary Marchionini,et al.  Find What You Need, Understand What You Find , 2007, Int. J. Hum. Comput. Interact..

[20]  Ian H. Witten,et al.  Adaptive Predictive Text Generation and the Reactive Keyboard , 1991, Interact. Comput..

[21]  Alia Amin,et al.  Organizing Suggestions in Autocompletion Interfaces , 2009, ECIR.

[22]  Hongbo Deng,et al.  A two-dimensional click model for query auto-completion , 2014, SIGIR.

[23]  Alexander Gruenstein,et al.  Accurate and compact large vocabulary speech recognition on mobile devices , 2013, INTERSPEECH.

[24]  Filip Radlinski,et al.  Personalizing web search using long term browsing history , 2011, WSDM '11.

[25]  Shumeet Baluja,et al.  The role of context in query input: using contextual signals to complete queries on mobile devices , 2007, Mobile HCI.

[26]  Yang Song,et al.  Post-ranking query suggestion by diversifying search results , 2011, SIGIR '11.

[27]  Pablo Castells,et al.  Personalized diversification of search results , 2012, SIGIR '12.

[28]  Iadh Ounis,et al.  Finding good feedback documents , 2009, CIKM.

[29]  Raghavendra Udupa,et al.  On Correcting Misspelled Queries in Email Search , 2015, AAAI.

[30]  Andrei Z. Broder,et al.  Online expansion of rare queries for sponsored search , 2009, WWW '09.

[31]  Marti A. Hearst Search User Interfaces , 2009 .

[32]  Claudio Carpineto,et al.  A Survey of Automatic Query Expansion in Information Retrieval , 2012, CSUR.

[33]  Huizhong Duan,et al.  Online spelling correction for query completion , 2011, WWW.

[34]  Ming Zhou,et al.  Improving Query Spelling Correction Using Web Search Results , 2007, EMNLP-CoNLL.

[35]  Abdur Chowdhury,et al.  A picture of search , 2006, InfoScale '06.

[36]  Ophir Frieder,et al.  Surrogate scoring for improved metasearch precision , 2005, SIGIR '05.

[37]  Luca Becchetti,et al.  An optimization framework for query recommendation , 2010, WSDM '10.

[38]  Andreas Dengel,et al.  Query expansion using gaze-based feedback on the subdocument level , 2008, SIGIR '08.

[39]  Tobias Scheffer,et al.  Sentence Completion , 1921, SIGIR '04.

[40]  Eugene Ciccarelli An Introduction to the EMACS Editor , 1978 .

[41]  Enhong Chen,et al.  Mining Concept Sequences from Large-Scale Search Logs for Context-Aware Query Suggestion , 2011, TIST.

[42]  Katja Hofmann,et al.  A probabilistic method for inferring preferences from clicks , 2011, CIKM '11.

[43]  M. de Rijke,et al.  Learning from homologous queries and semantically related terms for query auto completion , 2016, Inf. Process. Manag..

[44]  Ian McGraw,et al.  Garbage modeling for on-device speech recognition , 2015, INTERSPEECH.

[45]  Olivier Chapelle,et al.  A dynamic bayesian network click model for web search ranking , 2009, WWW '09.

[46]  Maarten de Rijke,et al.  Mining, Ranking and Recommending Entity Aspects , 2015, SIGIR.

[47]  I. Witten,et al.  The Reactive Keyboard: a predictive typing aid , 1990, Computer.

[48]  Prasenjit Mitra,et al.  Query suggestions in the absence of query logs , 2011, SIGIR.

[49]  Yuchen Zhang,et al.  User-click modeling for understanding and predicting search-behavior , 2011, KDD.

[50]  Guillem Francès,et al.  Improving the efficiency of multi-site web search engines , 2014, WSDM.

[51]  Chao Liu,et al.  Bayesian Browsing Model: Exact Inference of Document Relevance from Petabyte-Scale Data , 2010, TKDD.

[52]  M. de Rijke,et al.  Multileave Gradient Descent for Fast Online Learning to Rank , 2016, WSDM.

[53]  Surajit Chaudhuri,et al.  Extending autocompletion to tolerate errors , 2009, SIGMOD Conference.

[54]  Katja Hofmann,et al.  Reusing historical interaction data for faster online learning to rank for IR , 2013, DIR.

[55]  W. Bruce Croft,et al.  Term level search result diversification , 2013, SIGIR.

[56]  Michael R. Lyu,et al.  Learning latent semantic relations from clickthrough data for query suggestion , 2008, CIKM '08.

[57]  Michael R. Lyu,et al.  Diversifying Query Suggestion Results , 2010, AAAI.

[58]  Ryen W. White,et al.  Personalizing web search results by reading level , 2011, CIKM '11.

[59]  Nick Craswell,et al.  An experimental comparison of click position-bias models , 2008, WSDM '08.

[60]  Xueqi Cheng,et al.  A structured approach to query recommendation with social annotation data , 2010, CIKM.

[61]  Wei-Ying Ma,et al.  Search-based query suggestion , 2008, CIKM '08.

[62]  Christopher D. Manning,et al.  Introduction to Information Retrieval , 2010, J. Assoc. Inf. Sci. Technol..

[63]  Qi He,et al.  Web Query Recommendation via Sequential Query Prediction , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[64]  Soo Young Rieh,et al.  Analysis of multiple query reformulations on the web: The interactive information retrieval context , 2006, Information Processing & Management.

[65]  Yi Liu,et al.  Translating Queries into Snippets for Improved Query Expansion , 2008, COLING.

[66]  Giuseppe Ottaviano,et al.  Space-efficient data structures for Top-k completion , 2013, WWW '13.

[67]  M. de Rijke,et al.  Online Exploration for Detecting Shifts in Fresh Intent , 2014, CIKM.

[68]  Enhong Chen,et al.  Context-aware query suggestion by mining click-through and session data , 2008, KDD.

[69]  Ingmar Weber,et al.  Output-sensitive autocompletion search , 2006, Information Retrieval.

[70]  Jiawei Han,et al.  adaQAC: Adaptive Query Auto-Completion via Implicit Negative Feedback , 2015, SIGIR.

[71]  Yiqun Liu,et al.  Overview of the NTCIR-10 INTENT-2 Task , 2013, NTCIR.

[72]  Filip Radlinski,et al.  On user interactions with query auto-completion , 2014, SIGIR.

[73]  Laura Hollink,et al.  Search behavior of media professionals at an audiovisual archive: A transaction log analysis , 2010 .

[74]  Xueqi Cheng,et al.  More than relevance: high utility query recommendation by mining users' search behaviors , 2012, CIKM '12.

[75]  ChengXiang Zhai,et al.  A generalized hidden Markov model with discriminative training for query spelling correction , 2012, SIGIR '12.

[76]  W. Bruce Croft,et al.  Diversity by proportionality: an election-based approach to search result diversification , 2012, SIGIR '12.

[77]  Ellen M. Voorhees,et al.  TREC 2014 Web Track Overview , 2015, TREC.

[78]  Milad Shokouhi,et al.  Learning to personalize query auto-completion , 2013, SIGIR.

[79]  Ji-Rong Wen,et al.  WWW 2007 / Track: Search Session: Personalization A Largescale Evaluation and Analysis of Personalized Search Strategies ABSTRACT , 2022 .

[80]  Dhruv Matani An O(k log n) algorithm for prefix based ranked autocomplete , 2021, ArXiv.

[81]  Eric Brill,et al.  Spelling Correction as an Iterative Process that Exploits the Collective Knowledge of Web Users , 2004, EMNLP.

[82]  Gilad Mishne,et al.  Organizing query completions for web search , 2010, CIKM '10.

[83]  de RijkeMaarten,et al.  Learning from homologous queries and semantically related terms for query auto completion , 2016 .

[84]  Xuehua Shen,et al.  Context-sensitive information retrieval using implicit feedback , 2005, SIGIR '05.

[85]  Gabriella Kazai,et al.  Crowdsourcing for book search evaluation: impact of hit design on comparative system ranking , 2011, SIGIR.

[86]  Gregg C. Vanderheiden,et al.  Comparative analysis of fixed-vocabulary communication acceleration techniques , 1987 .

[87]  Xu Sun,et al.  Learning Phrase-Based Spelling Error Models from Clickthrough Data , 2010, ACL.

[88]  M. de Rijke,et al.  Diversifying Query Auto-Completion , 2016, ACM Trans. Inf. Syst..

[89]  Michael Gamon,et al.  Active objects: actions for entity-centric search , 2012, WWW.

[90]  Rosie Jones,et al.  Beyond the session timeout: automatic hierarchical segmentation of search topics in query logs , 2008, CIKM '08.

[91]  Craig MacDonald,et al.  Comparing Approaches for Query Autocompletion , 2015, SIGIR.

[92]  Wei-Ying Ma,et al.  Probabilistic query expansion using query logs , 2002, WWW '02.

[93]  W. Bruce Croft,et al.  Diversifying query suggestions based on query documents , 2014, SIGIR.

[94]  Kartik Hosanagar,et al.  Recommender systems and their impact on sales diversity , 2007, EC '07.

[95]  Yi Chen,et al.  Query Expansion Based on Clustered Results , 2011, Proc. VLDB Endow..

[96]  Matthew Richardson,et al.  Predicting clicks: estimating the click-through rate for new ads , 2007, WWW '07.

[97]  M. de Rijke,et al.  Time-sensitive Personalized Query Auto-Completion , 2014, CIKM.

[98]  Bhaskar Mitra,et al.  An Eye-tracking Study of User Interactions with Query Auto Completion , 2014, CIKM.

[99]  Ricardo A. Baeza-Yates,et al.  Query Recommendation Using Query Logs in Search Engines , 2004, EDBT Workshops.

[100]  Jaana Kekäläinen,et al.  Cumulated gain-based evaluation of IR techniques , 2002, TOIS.

[101]  Tie-Yan Liu,et al.  Learning to rank for information retrieval , 2009, SIGIR.

[102]  Dimitrios Gunopulos,et al.  Identifying similarities, periodicities and bursts for online search queries , 2004, SIGMOD '04.

[103]  Patrick Ruch Using Contextual Spelling Correction to Improve Retrieval Effectiveness in Degraded Text Collections , 2002, COLING.

[104]  Susan T. Dumais,et al.  Understanding temporal query dynamics , 2011, WSDM '11.

[105]  Maarten de Rijke,et al.  Balancing Relevance Criteria through Multi-Objective Optimization , 2016, SIGIR.

[106]  Ryen W. White,et al.  Understanding web browsing behaviors through Weibull analysis of dwell time , 2010, SIGIR.

[107]  Xueqi Cheng,et al.  A unified framework for recommending diverse and relevant queries , 2011, WWW.

[108]  Berthier A. Ribeiro-Neto,et al.  Concept-based interactive query expansion , 2005, CIKM '05.

[109]  M. de Rijke,et al.  Behavior‐based personalization in web search , 2017, J. Assoc. Inf. Sci. Technol..

[110]  Benjamin Piwowarski,et al.  A user browsing model to predict search engine click data from past observations. , 2008, SIGIR '08.

[111]  Padhraic Smyth,et al.  Text-based measures of document diversity , 2013, KDD.

[112]  Guoliang Li,et al.  Efficient fuzzy full-text type-ahead search , 2011, The VLDB Journal.

[113]  Thorsten Joachims,et al.  Accurately interpreting clickthrough data as implicit feedback , 2005, SIGIR '05.

[114]  Sreenivas Gollapudi,et al.  Diversifying search results , 2009, WSDM '09.

[115]  Pavel Serdyukov,et al.  Actualization of query suggestions using query logs , 2012, WWW.

[116]  Bhaskar Mitra,et al.  Query Auto-Completion for Rare Prefixes , 2015, CIKM.

[117]  Steve Chien,et al.  Semantic similarity between search engine queries using temporal correlation , 2005, WWW '05.

[118]  Peter Haider,et al.  Learning to Complete Sentences , 2005, ECML.

[119]  Joemon M. Jose,et al.  Recent and robust query auto-completion , 2014, WWW.

[120]  Joemon M. Jose,et al.  Exploring Real-Time Temporal Query Auto-Completion , 2013, DIR.

[121]  Ingmar Weber,et al.  The demographics of web search , 2010, SIGIR.

[122]  Meredith Ringel Morris,et al.  Enhancing collaborative web search with personalization: groupization, smart splitting, and group hit-highlighting , 2008, CSCW.

[123]  Xin-She Yang,et al.  Introduction to Algorithms , 2021, Nature-Inspired Optimization Algorithms.

[124]  Milad Shokouhi,et al.  Detecting seasonal queries by time-series analysis , 2011, SIGIR.

[125]  Zheng Chen,et al.  A novel click model and its applications to online advertising , 2010, WSDM '10.

[126]  Raghavendra Udupa,et al.  Hashing-Based Approaches to Spelling Correction of Personal Names , 2010, EMNLP.

[127]  Paulo J. G. Lisboa,et al.  The value of personalised recommender systems to e-business: a case study , 2008, RecSys '08.

[128]  Wei Chu,et al.  Modeling the impact of short- and long-term behavior on search personalization , 2012, SIGIR '12.

[129]  Yiqun Liu,et al.  Overview of the NTCIR-12 IMine-2 Task , 2016, NTCIR.

[130]  M. de Rijke,et al.  A Neural Click Model for Web Search , 2016, WWW.

[131]  Yiqun Liu,et al.  Overview of the NTCIR-11 IMine Task , 2014, NTCIR.

[132]  Saul Vargas,et al.  Explicit relevance models in intent-oriented information retrieval diversification , 2012, SIGIR '12.

[133]  Di Jiang,et al.  Context-aware search personalization with concept preference , 2011, CIKM '11.

[134]  Ian H. Witten,et al.  The Reactive Keyboard , 1992 .

[135]  Milad Shokouhi,et al.  Query Suggestion and Data Fusion in Contextual Disambiguation , 2015, WWW.

[136]  M. de Rijke,et al.  Using Coherence-Based Measures to Predict Query Difficulty , 2008, ECIR.

[137]  M. de Rijke,et al.  Mapping queries to the Linking Open Data cloud: A case study using DBpedia , 2011, J. Web Semant..

[138]  Guoliang Li,et al.  Efficient interactive fuzzy keyword search , 2009, WWW '09.

[139]  Pavel Serdyukov,et al.  Personalization of web-search using short-term browsing context , 2013, CIKM.

[140]  Alan F. Newell,et al.  Adaptive and predictive techniques in a communication prosthesis , 1987 .

[141]  Pu-Jen Cheng,et al.  Learning user reformulation behavior for query auto-completion , 2014, SIGIR.

[142]  Kunihiko Sadakane,et al.  Efficient Error-tolerant Query Autocompletion , 2013, Proc. VLDB Endow..

[143]  Clement T. Yu,et al.  An effective approach to document retrieval via utilizing WordNet and recognizing phrases , 2004, SIGIR '04.

[144]  James Allan,et al.  Task-aware query recommendation , 2013, SIGIR.

[145]  Susan T. Dumais,et al.  The vocabulary problem in human-system communication , 1987, CACM.

[146]  Hang Li,et al.  Named entity recognition in query , 2009, SIGIR.

[147]  Craig MacDonald,et al.  User model-based metrics for offline query suggestion evaluation , 2013, SIGIR.

[148]  Gilad Mishne,et al.  A Study of Blog Search , 2006, ECIR.

[149]  Dan Roth,et al.  A Discriminative Model for Query Spelling Correction with Latent Structural SVM , 2012, EMNLP.

[150]  M. de Rijke,et al.  Click Models for Web Search , 2015, Click Models for Web Search.

[151]  Tetsuya Sakai,et al.  Structured query suggestion for specialization and parallel movement: effect on search behaviors , 2012, WWW.

[152]  M. de Rijke,et al.  Selectively Personalizing Query Auto-Completion , 2016, SIGIR.

[153]  Weiguo Fan,et al.  Web Query Prediction by Unifying Model , 2008, 2008 IEEE International Conference on Data Mining Workshops.

[154]  Milad Shokouhi,et al.  Time-sensitive query auto-completion , 2012, SIGIR '12.

[155]  Joemon M. Jose,et al.  Effectiveness of additional representations for the search result presentation on the web , 2008, Inf. Process. Manag..

[156]  Fabrizio Silvestri,et al.  Identifying task-based sessions in search engine query logs , 2011, WSDM '11.

[157]  Paul-Alexandru Chirita,et al.  Personalized query expansion for the web , 2007, SIGIR.

[158]  Maarten de Rijke,et al.  Probabilistic Multileave Gradient Descent , 2016, ECIR.

[159]  Yisong Yue,et al.  Beyond position bias: examining result attractiveness as a source of presentation bias in clickthrough data , 2010, WWW '10.

[160]  Bhaskar Mitra,et al.  Exploring Session Context using Distributed Representations of Queries and Reformulations , 2015, SIGIR.

[161]  Katja Hofmann,et al.  Information Retrieval manuscript No. (will be inserted by the editor) Balancing Exploration and Exploitation in Listwise and Pairwise Online Learning to Rank for Information Retrieval , 2022 .

[162]  M. de Rijke,et al.  Personalized document re-ranking based on Bayesian probabilistic matrix factorization , 2014, SIGIR.

[163]  Yehuda Koren,et al.  Expediting search trend detection via prediction of query counts , 2013, WSDM.

[164]  Jaime Teevan,et al.  Understanding and predicting personal navigation , 2011, WSDM '11.

[165]  Olivier Chapelle,et al.  Expected reciprocal rank for graded relevance , 2009, CIKM.

[166]  Charles L. A. Clarke,et al.  The influence of caption features on clickthrough patterns in web search , 2007, SIGIR.

[167]  Ziv Bar-Yossef,et al.  Context-sensitive query auto-completion , 2011, WWW.

[168]  Yang Liu,et al.  Adaptive query suggestion for difficult queries , 2012, SIGIR '12.

[169]  Xu Sun,et al.  A Large Scale Ranker-Based System for Search Query Spelling Correction , 2010, COLING.

[170]  Jian-Yun Nie,et al.  Using query contexts in information retrieval , 2007, SIGIR.

[171]  Yelong Shen,et al.  Learning semantic representations using convolutional neural networks for web search , 2014, WWW.

[172]  Edgar Meij,et al.  An evaluation of entity and frequency based query completion methods , 2009, SIGIR.

[173]  James Allan,et al.  A cluster-based resampling method for pseudo-relevance feedback , 2008, SIGIR '08.

[174]  Chao Liu,et al.  Efficient multiple-click models in web search , 2009, WSDM '09.

[175]  Ryen W. White,et al.  Probabilistic models for personalizing web search , 2012, WSDM '12.

[176]  Paul Goodwin,et al.  The Holt-Winters Approach to Exponential Smoothing: 50 Years Old and Going Strong , 2010 .

[177]  Fei Cai,et al.  Prefix-Adaptive and Time-Sensitive Personalized Query Auto Completion , 2016, IEEE Transactions on Knowledge and Data Engineering.

[178]  Enhong Chen,et al.  Context-aware ranking in web search , 2010, SIGIR '10.

[179]  Yannis Tzitzikas,et al.  Advancing Search Query Autocompletion Services with More and Better Suggestions , 2010, ICWE.

[180]  Charles L. A. Clarke,et al.  An Effectiveness Measure for Ambiguous and Underspecified Queries , 2009, ICTIR.

[181]  Tetsuya Sakai,et al.  When do people use query suggestion? A query suggestion log analysis , 2013, Information Retrieval.