Type-Ahead Exploratory Search through Typo and Word Order Tolerant Autocompletion

There is an increasing interest on recommending to the user instantly (during typing characters) queries and query results. This is evidenced by the emergence of several systems that offer such functionalities, e.g. Google Instant Search for Web searching or Facebook Search for social searching. In this paper we consider showing more rich recommendations that show several other kinds of supplementary information that provide the user with a better overview of the search space. This supplementary information can be the result of various tasks (e.g. textual clustering or entity mining of the top search results), may have very large size and may cost a lot to be derived. The instant presentation of these recommendations (as the user types a query letter-by-letter) helps the user (a) to quickly discover what is popular among other users, (b) to decide fast which (of the suggested) query completions to use, and (c) to decide what hits of the returned answer to inspect. In this paper we focus on making this feasible (scalable) and flexible. Regarding scalability we elaborate on an approach based on precomputed information and we comparatively evaluate various trie-based index structures for making real-time interaction feasible, even if the size of the available memory space is limited. Specifically, we show how with modest hardware (like this of a mobile device) one can provide instant access to large amounts of data. Moreover, we propose and experimentally evaluate an incremental procedure for updating the index. For improving the throughput that can be served we analyze and experimentally evaluate various policies for caching subtries. With regard to flexibility, in order to reduce user's effort and to increase the exploitation of the precomputed information, we elaborate on how the recommendations can tolerate different word orders and spelling errors, assuming the proposed trie-based index structures. The experimental results revealed that such functionality significantly increases the number of recommendations especially for queries that contain several words. Finally, we propose an algorithm for computing the top-K suggestions that exploits the ranking information in order to reduce the trie traversals. An experimental evaluation proves that the proposed algorithm highly improves the retrieval time.

[1]  Torsten Suel,et al.  Three-Level Caching for Efficient Query Processing in Large Web Search Engines , 2005, WWW '05.

[2]  Torsten Suel,et al.  Efficient query processing in large web search engines , 2006 .

[3]  Wagner Meira,et al.  Rank-preserving two-level caching for scalable search engines , 2001, SIGIR '01.

[4]  Guoliang Li,et al.  Supporting efficient top-k queries in type-ahead search , 2012, SIGIR '12.

[5]  Yannis Tzitzikas,et al.  Web Searching with Entity Mining at Query Time , 2012, IRFC.

[6]  Ricardo Baeza-Yates,et al.  ResIn: a combination of results caching and index pruning for high-performance web search engines , 2008, SIGIR '08.

[7]  Giuseppe Ottaviano,et al.  Fast Compressed Tries through Path Decompositions , 2011, ALENEX.

[8]  Giuseppe Ottaviano,et al.  Space-efficient data structures for Top-k completion , 2013, WWW '13.

[9]  Guoliang Li,et al.  Interactive search in XML data , 2009, WWW '09.

[10]  Yinglian Xie,et al.  Locality in search engine queries and its implications for caching , 2002, Proceedings.Twenty-First Annual Joint Conference of the IEEE Computer and Communications Societies.

[11]  Monica M. C. Schraefel,et al.  A longitudinal study of exploratory and keyword search , 2008, JCDL '08.

[12]  Torsten Suel,et al.  Improved techniques for result caching in web search engines , 2009, WWW '09.

[13]  Moni Naor,et al.  Optimal aggregation algorithms for middleware , 2001, PODS '01.

[14]  Yannis Tzitzikas,et al.  Advancing Search Query Autocompletion Services with More and Better Suggestions , 2010, ICWE.

[15]  Guoliang Li,et al.  Efficient type-ahead search on relational data: a TASTIER approach , 2009, SIGMOD Conference.

[16]  Gonzalo Navarro,et al.  A guided tour to approximate string matching , 2001, CSUR.

[17]  Shlomo Moran,et al.  Predictive caching and prefetching of query results in search engines , 2003, WWW '03.

[18]  Surajit Chaudhuri,et al.  Extending autocompletion to tolerate errors , 2009, SIGMOD Conference.

[19]  Aristides Gionis,et al.  The impact of caching on search engines , 2007, SIGIR.

[20]  Ajay Mohindra,et al.  Dynamic Scaling of Web Applications in a Virtualized Cloud Computing Environment , 2009, 2009 IEEE International Conference on e-Business Engineering.

[21]  Matthew Banta,et al.  What do exploratory searchers look at in a faceted search interface? , 2009, JCDL '09.

[22]  Andrei Z. Broder,et al.  Online expansion of rare queries for sponsored search , 2009, WWW '09.

[23]  Elizabeth F. Churchill,et al.  Three sequential positions of query repair in interactions with internet search engines , 2011, CSCW.

[24]  Victoria Ungureanu,et al.  Effective load balancing for cluster-based servers employing job preemption , 2008, Perform. Evaluation.

[25]  Robert S. Boyer,et al.  A fast string searching algorithm , 1977, CACM.

[27]  Jaime Teevan,et al.  Information re-retrieval: repeat queries in Yahoo's logs , 2007, SIGIR.

[28]  Jake Wallis,et al.  Google and the Digital Divide: The Bias of Online Knowledge , 2011 .

[29]  Mika Käki,et al.  Findex: search result categories help users when document ranking fails , 2005, CHI.

[30]  Philip S. Yu,et al.  Dynamic Load Balancing on Web-Server Systems , 1999, IEEE Internet Comput..

[31]  Mika Käki,et al.  Findex: improving search result use through automatic filtering categories , 2005, Interact. Comput..

[32]  Eric Brill,et al.  Spelling Correction as an Iterative Process that Exploits the Collective Knowledge of Web Users , 2004, EMNLP.

[33]  Roi Blanco,et al.  Caching search engine results over incremental indices , 2010, WWW '10.

[34]  Guoliang Li,et al.  Efficient interactive fuzzy keyword search , 2009, WWW '09.

[35]  Ingmar Weber,et al.  The CompleteSearch Engine: Interactive, Efficient, and Towards IR& DB Integration , 2007, CIDR.

[36]  Hannah Bast,et al.  Efficient fuzzy search in large text collections , 2013, TOIS.

[37]  Evangelos P. Markatos,et al.  On caching search engine query results , 2001, Comput. Commun..

[38]  Andreas Dengel,et al.  Attentive documents: Eye tracking as implicit feedback for information retrieval and beyond , 2012, TIIS.

[39]  Francesco Ricci,et al.  Supporting product selection with query editing recommendations , 2007, RecSys '07.

[40]  Fabrizio Silvestri,et al.  Boosting the performance of Web search engines: Caching and prefetching query results by exploiting historical usage data , 2006, TOIS.

[41]  Fabrizio Silvestri,et al.  Mining Query Logs: Turning Search Usage Data into Knowledge , 2010, Found. Trends Inf. Retr..

[42]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[43]  Ingmar Weber,et al.  Type less, find more: fast autocompletion search with a succinct index , 2006, SIGIR.

[44]  Gerard Salton,et al.  Term-Weighting Approaches in Automatic Text Retrieval , 1988, Inf. Process. Manag..

[45]  Yannis Tzitzikas,et al.  Scalable, flexible and generic instant overview search , 2012, WWW.

[46]  Berkant Barla Cambazoglu,et al.  A refreshing perspective of search engine caching , 2010, WWW '10.

[47]  Mike Thelwall,et al.  Search engine coverage bias: evidence and possible causes , 2004, Inf. Process. Manag..

[48]  Amanda Spink,et al.  From E-Sex to E-Commerce: Web Search Changes , 2002, Computer.

[49]  Huizhong Duan,et al.  Online spelling correction for query completion , 2011, WWW.

[50]  Yannis Tzitzikas,et al.  STC+ and NM-STC: Two Novel Online Results Clustering Methods for Web Searching , 2009, WISE.

[51]  Monika Henzinger,et al.  Analysis of a very large web search engine query log , 1999, SIGF.

[52]  Hao Wu,et al.  Seaform: Search-As-You-Type in Forms , 2010, Proc. VLDB Endow..

[53]  Yannis Tzitzikas,et al.  Exploiting Available Memory and Disk for Scalable Instant Overview Search , 2011, WISE.

[54]  Chai Quek,et al.  A NeuroCognitive Approach to Decision Making for the Reconstruction of the Metabolic Insulin Profile of a Healthy Person , 2010 .

[55]  SaltonGerard,et al.  Term-weighting approaches in automatic text retrieval , 1988 .

[56]  Torsten Suel,et al.  Batch query processing for web search engines , 2011, WSDM '11.