Efficient GPU-Based Query Processing with Pruned List Caching in Search Engines

There are two inherent obstacles to effectively using Graphics Processing Units (GPUs) for query processing in search engines: (a) the highly restricted GPU memory space, and (b) the CPU-GPU transfer latency. Previously, Ao et al. presented a GPU method for lists intersection, an essential component in AND-based query processing. However, this work assumes the whole inverted index can be stored in GPU memory and does not address document ranking. In this paper, we describe and analyze a GPU query processing method which incorporates both lists intersection and top-k ranking. We introduce a parameterized pruned posting list GPU caching method where the parameter determines how much GPU memory is used for caching. This method allows list caching for large inverted indexes using the limited GPU memory, thereby making a qualitative improvement over previous work. We also give a mathematical model which can identify an approximately optimal choice of the parameter. Experimental results indicate that this GPU approach under the pruned list caching policy achieves better query throughput than its CPU counterpart, even when the inverted index size is much larger than the GPU memory space.

[1]  Bingsheng He,et al.  Relational query coprocessing on graphics processors , 2009, TODS.

[2]  Gang Wang,et al.  Latency-aware strategy for static list caching in flash-based web search engines , 2013, CIKM.

[3]  Gang Wang,et al.  Efficient Parallel Lists Intersection and Index Compression Algorithms using Graphics Processing Units , 2011, Proc. VLDB Endow..

[4]  Stephen E. Robertson,et al.  GatfordCentre for Interactive Systems ResearchDepartment of Information , 1996 .

[5]  Sudhakar Yalamanchili,et al.  Kernel Weaver: Automatically Fusing Database Primitives for Efficient GPU Computation , 2012, 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture.

[6]  Özgür Ulusoy,et al.  A five-level static cache architecture for web search engines , 2012, Inf. Process. Manag..

[7]  Özgür Ulusoy,et al.  Cost-Aware Strategies for Query Result Caching in Web Search Engines , 2011, TWEB.

[8]  Ellen M. Voorhees,et al.  Overview of TREC 2003. , 2003 .

[9]  Kim M. Hazelwood,et al.  Where is the data? Why you cannot debate CPU vs. GPU performance without the answer , 2011, (IEEE ISPASS) IEEE INTERNATIONAL SYMPOSIUM ON PERFORMANCE ANALYSIS OF SYSTEMS AND SOFTWARE.

[10]  Torsten Suel,et al.  Using graphics processors for high-performance IR query processing , 2008, WWW.

[11]  Wagner Meira,et al.  Rank-preserving two-level caching for scalable search engines , 2001, SIGIR '01.

[12]  Ricardo A. Baeza-Yates,et al.  A Three Level Search Engine Index Based in Query Log Distribution , 2003, SPIRE.

[13]  Gang Wang,et al.  Efficient lists intersection by CPU-GPU cooperative computing , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW).

[14]  Sudipto Guha,et al.  Improving the Performance of List Intersection , 2009, Proc. VLDB Endow..

[15]  Torsten Suel,et al.  Three-level caching for efficient query processing in large Web search engines , 2005, WWW.

[16]  Torsten Suel,et al.  Improved techniques for result caching in web search engines , 2009, WWW '09.

[17]  Alistair Moffat,et al.  Pruned query evaluation using pre-computed impacts , 2006, SIGIR.

[18]  Mahmut T. Kandemir,et al.  Exploiting Core Criticality for Enhanced GPU Performance , 2016, SIGMETRICS.

[19]  Alexandros Ntoulas,et al.  Pruning policies for two-tiered inverted index with correctness guarantee , 2007, SIGIR.

[20]  Dinesh Manocha,et al.  Fast computation of database operations using graphics processors , 2004, SIGMOD '04.

[21]  Justin Zobel,et al.  Dynamic index pruning for effective caching , 2007, CIKM '07.

[22]  Fan Zhang,et al.  Revisiting globally sorted indexes for efficient document retrieval , 2010, WSDM '10.

[23]  Bingsheng He,et al.  In-Cache Query Co-Processing on Coupled CPU-GPU Architectures , 2014, Proc. VLDB Endow..

[24]  Frank Wm. Tompa,et al.  Skewed partial bitvectors for list intersection , 2014, SIGIR.

[25]  Bingsheng He,et al.  GPUQP: query co-processing using graphics processors , 2007, SIGMOD '07.

[26]  Vo Anh,et al.  Impact-Based Document Retrieval , 2004 .

[27]  Charles L. A. Clarke,et al.  Faster and smaller inverted indices with treaps , 2013, SIGIR.

[28]  Gang Wang,et al.  Fast lists intersection with Bloom filter using graphics processing units , 2011, SAC '11.

[29]  Aristides Gionis,et al.  The impact of caching on search engines , 2007, SIGIR.

[30]  Mike O'Connor,et al.  MemcachedGPU: scaling-up scale-out key-value stores , 2015, SoCC.

[31]  Satoshi Matsuoka,et al.  GPU-Accelerated Large-Scale Distributed Sorting Coping with Device Memory Capacity , 2016, IEEE Transactions on Big Data.

[32]  Özgür Ulusoy,et al.  Static index pruning in web search engines: Combining term and document popularities with query views , 2012, TOIS.

[33]  Torsten Suel,et al.  Optimized Query Execution in Large Search Engines with Global Page Ordering , 2003, VLDB.

[34]  Hao Li,et al.  Join algorithms on GPUs: A revisit after seven years , 2015, 2015 IEEE International Conference on Big Data (Big Data).

[35]  Shuaiwen Song,et al.  Tag-Split Cache for Efficient GPGPU Cache Utilization , 2016, ICS.

[36]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[37]  J. Shane Culpepper,et al.  Efficient set intersection for inverted indexing , 2010, TOIS.