Score-safe term-dependency processing with hybrid indexes

Score-safe index processing has received a great deal of attention over the last two decades. By pre-calculating maximum term impacts during indexing, the number of scoring operations can be minimized, and the top-k documents for a query can be located efficiently. However, these methods often ignore the importance of the effectiveness gains possible when using sequential dependency models. We present a hybrid approach which leverages score-safe processing and suffix-based self-indexing structures in order to provide efficient and effective top-k document retrieval.

[1]  Kunihiko Sadakane,et al.  Succinct data structures for flexible text retrieval systems , 2007, J. Discrete Algorithms.

[2]  Howard R. Turtle,et al.  Query Evaluation: Strategies and Optimizations , 1995, Inf. Process. Manag..

[3]  Torsten Suel,et al.  Optimizing top-k document retrieval strategies for block-max indexes , 2013, WSDM.

[4]  J. Shane Culpepper,et al.  Sketch-based indexing of n-words , 2012, CIKM.

[5]  Torsten Suel,et al.  Faster top-k document retrieval using block-max indexes , 2011, SIGIR.

[6]  Alistair Moffat,et al.  From Theory to Practice: Plug and Play with Succinct Data Structures , 2013, SEA.

[7]  Leonid Boytsov,et al.  Decoding billions of integers per second through vectorization , 2012, Softw. Pract. Exp..

[8]  J. Shane Culpepper,et al.  Top-k Ranked Document Search in General Text Databases , 2010, ESA.

[9]  Özgür Ulusoy,et al.  A five-level static cache architecture for web search engines , 2012, Inf. Process. Manag..

[10]  Wing-Kai Hon,et al.  Space-Efficient Framework for Top-k String Retrieval Problems , 2009, 2009 50th Annual IEEE Symposium on Foundations of Computer Science.

[11]  Hugh E. Williams,et al.  Fast phrase querying with combined indexes , 2004, TOIS.

[12]  Andrei Z. Broder,et al.  Efficient query evaluation using a two-level retrieval process , 2003, CIKM '03.

[13]  Meng He,et al.  Indexing Compressed Text , 2003 .

[14]  S. Muthukrishnan,et al.  Efficient algorithms for document retrieval problems , 2002, SODA '02.

[15]  Andrew Trotman,et al.  Towards an Efficient and Effective Search Engine , 2012, OSIR@SIGIR.

[16]  J. Shane Culpepper,et al.  Efficient in-memory top-k document retrieval , 2012, SIGIR '12.

[17]  Gonzalo Navarro,et al.  Faster Compact Top-k Document Retrieval , 2012, 2013 Data Compression Conference.

[18]  JUSTIN ZOBEL,et al.  Inverted files for text search engines , 2006, CSUR.

[19]  W. Bruce Croft,et al.  A Markov random field model for term dependencies , 2005, SIGIR '05.

[20]  Sebastiano Vigna,et al.  Quasi-succinct indices , 2012, WSDM.