NBLucene: Flexible and Efficient Open Source Search Engine

The most popular open source projects for text searching have been designed to support many features. These projects are well-written in Java for cross-platform using. But when conducting research, the execution efficiency of program should be more essential, which is a problem for applications written in Java. It is also difficult for Java to use parallel mechanisms in the modern computer system like SIMD and GPUs. To this end, we expand an open source text searching project written in C++ for research purpose.

[1]  Alistair Moffat,et al.  Index Compression Using Fixed Binary Codewords , 2004, ADC.

[2]  Leonid Boytsov,et al.  SIMD compression and the intersection of sorted integers , 2014, Softw. Pract. Exp..

[3]  Stephen E. Robertson,et al.  Relevance weighting of search terms , 1976, J. Am. Soc. Inf. Sci..

[4]  Torsten Suel,et al.  Performance of compressed inverted list caching in search engines , 2008, WWW.

[5]  Wolfgang Lehner,et al.  Fast integer compression using SIMD instructions , 2010, DaMoN '10.

[6]  Jan O. Pedersen,et al.  Optimization for dynamic inverted index maintenance , 1989, SIGIR '90.

[7]  Hongfei Yan,et al.  Group-Scheme: SIMD-based compression algorithms for web text data , 2013, 2013 IEEE International Conference on Big Data.

[8]  Ricardo Baeza-Yates,et al.  A Comparison of Open Source Search Engines , 2007 .

[9]  Leonid Boytsov,et al.  Decoding billions of integers per second through vectorization , 2012, Softw. Pract. Exp..

[10]  W. Bruce Croft,et al.  Indri: A language-model based search engine for complex queries1 , 2005 .

[11]  Alistair Moffat,et al.  Inverted Index Compression Using Word-Aligned Binary Codes , 2004, Information Retrieval.

[12]  Wolfgang Lehner,et al.  Fast Sorted-Set Intersection using SIMD Instructions , 2011, ADMS@VLDB.

[13]  Nicole Bauer,et al.  Information Retrieval Implementing And Evaluating Search Engines , 2016 .

[14]  Gang Wang,et al.  Efficient Parallel Lists Intersection and Index Compression Algorithms using Graphics Processing Units , 2011, Proc. VLDB Endow..

[15]  Torsten Suel,et al.  Using graphics processors for high-performance IR query processing , 2008, WWW.

[16]  Torsten Suel,et al.  Inverted index compression and query processing with optimized document ordering , 2009, WWW '09.

[17]  Hugh E. Williams,et al.  Fast generation of result snippets in web search , 2007, SIGIR.

[18]  Hugh E. Williams,et al.  The Zettair Search Engine , 1998 .

[19]  Alexander Zeier,et al.  SIMD-Scan: Ultra Fast in-Memory Table Scan using on-Chip Vector Processing Units , 2009, Proc. VLDB Endow..

[20]  Vagelis Hristidis,et al.  A system for query-specific document summarization , 2006, CIKM '06.

[21]  Alexander A. Stepanov,et al.  SIMD-based decoding of posting lists , 2011, CIKM '11.

[22]  Jeffrey Dean,et al.  Challenges in building large-scale information retrieval systems: invited talk , 2009, WSDM '09.

[23]  Hannah Bast,et al.  Efficient Index-Based Snippet Generation , 2014, TOIS.