Griffin: uniting CPU and GPU in information retrieval systems for intra-query parallelism

Interactive information retrieval services, such as enterprise search and document search, must provide relevant results with consistent, low response times in the face of rapidly growing data sets and query loads. These growing demands have led researchers to consider a wide range of optimizations to reduce response latency, including query processing parallelization and acceleration with co-processors such as GPUs. However, previous work runs queries either on GPU or CPU, ignoring the fact that the best processor for a given query depends on the query's characteristics, which may change as the processing proceeds. We present Griffin, an IR systems that dynamically combines GPU- and CPU-based algorithms to process individual queries according to their characteristics. Griffin uses state-of-the-art CPU-based query processing techniques and incorporates a novel approach to GPU-based query evaluation. Our GPU-based approach, as far as we know, achieves the best available GPU search performance by leveraging a new compression scheme and exploiting an advanced merge-based intersection algorithm. We evaluate Griffin with real world queries and datasets, and show that it improves query performance by 10x compared to a highly optimized CPU-only implementation, and 1.5x compared to our GPU-approach running alone. We also find that Griffin helps reduce the 95th-, 99th-, and 99.9th-percentile query response time by 10.4x, 16.1x, and 26.8x, respectively.

[1]  Gang Wang,et al.  The impact of solid state drive on search engine cache management , 2013, SIGIR.

[2]  Gang Wang,et al.  Efficient Parallel Lists Intersection and Index Compression Algorithms using Graphics Processing Units , 2011, Proc. VLDB Endow..

[3]  David A. Bader,et al.  GPU merge path: a GPU merging algorithm , 2012, ICS '12.

[4]  Yang Liu,et al.  Hippogriff: Efficiently moving data in heterogeneous computing systems , 2016, 2016 IEEE 34th International Conference on Computer Design (ICCD).

[5]  Shirish Tatikonda,et al.  Posting list intersection on multicore architectures , 2011, SIGIR.

[6]  Yannis Papakonstantinou,et al.  An Experimental Study of Bitmap Compression vs. Inverted List Compression , 2017, SIGMOD Conference.

[8]  Yannis Papakonstantinou,et al.  MILC: Inverted List Compression in Memory , 2017, Proc. VLDB Endow..

[9]  Christopher D. Manning,et al.  Introduction to Information Retrieval , 2010, J. Assoc. Inf. Sci. Technol..

[10]  Sebastiano Vigna,et al.  Quasi-succinct indices , 2012, WSDM.

[11]  Yannis Papakonstantinou,et al.  SSD In-Storage Computing for Search Engines , 2016 .

[12]  Seung-won Hwang,et al.  Predictive parallelization: taming tail latencies in web search , 2014, SIGIR.

[13]  Raj Vaswani,et al.  A dynamic processor allocation policy for multiprogrammed shared-memory multiprocessors , 1993, TOCS.

[14]  W. Bruce Croft,et al.  Optimization strategies for complex queries , 2005, SIGIR '05.

[15]  Torsten Suel,et al.  Performance of compressed inverted list caching in search engines , 2008, WWW.

[16]  Arun Raman,et al.  Parallelism orchestration using DoPE: the degree of parallelism executive , 2011, PLDI '11.

[17]  J. Shane Culpepper,et al.  Efficient set intersection for inverted indexing , 2010, TOIS.

[18]  Ricardo Bianchini,et al.  Few-to-Many: Incremental Parallelism for Reducing Tail Latency in Interactive Services , 2015, ASPLOS.

[19]  Amitabh Sinha,et al.  Non-Clairvoyant Scheduling for Minimizing Mean Slowdown , 2003, Algorithmica.

[20]  Torsten Suel,et al.  Using graphics processors for high performance IR query processing , 2009, WWW.

[21]  Kushagra Vaid,et al.  Web search using mobile cores: quantifying and mitigating the price of efficiency , 2010, ISCA.

[22]  Alan L. Cox,et al.  Adaptive parallelism for web search , 2013, EuroSys '13.

[23]  Yannis Papakonstantinou,et al.  SSD in-storage computing for list intersection , 2016, DaMoN '16.

[24]  Yitzhak Birk,et al.  Merge Path - Parallel Merging Made Simple , 2012, 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum.

[25]  Peter Elias,et al.  Efficient Storage and Retrieval by Content and Address of Static Files , 1974, JACM.

[26]  Yannis Papakonstantinou,et al.  HippogriffDB: Balancing I/O and GPU Bandwidth in Big Data Analytics , 2016, Proc. VLDB Endow..

[27]  Yang Liu,et al.  SPMario: Scale up MapReduce with I/O-Oriented Scheduling for the GPU , 2016, 2016 IEEE 34th International Conference on Computer Design (ICCD).

[28]  JUSTIN ZOBEL,et al.  Inverted files for text search engines , 2006, CSUR.

[29]  Stephen E. Robertson,et al.  Some simple effective approximations to the 2-Poisson model for probabilistic weighted retrieval , 1994, SIGIR '94.

[30]  Gang Wang,et al.  Efficient lists intersection by CPU-GPU cooperative computing , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW).

[31]  Marcin Zukowski,et al.  Super-Scalar RAM-CPU Cache Compression , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[32]  Luiz André Barroso,et al.  The Case for Energy-Proportional Computing , 2007, Computer.

[33]  Torsten Suel,et al.  Inverted index compression and query processing with optimized document ordering , 2009, WWW '09.

[34]  Jun Pang,et al.  Rhythm: harnessing data parallel hardware for server workloads , 2014, ASPLOS.

[35]  Jeffrey D. Blanchard,et al.  Fast k-selection algorithms for graphics processing units , 2012, JEAL.