Efficient lists intersection by CPU-GPU cooperative computing

Lists intersection is an important operation in modern web search engines. Many prior studies have focused on the single-core or multi-core CPU platform or many-core GPU. In this paper, we propose a CPU-GPU cooperative model that can integrate the computing power of CPU and GPU to perform lists intersection more efficiently. In the so-called synchronous mode, queries are grouped into batches and processed by GPU for high throughput. We design a query-parallel GPU algorithm based on an element-thread mapping strategy for load balancing. In the traditional asynchronous model, queries are processed one-by-one by CPU or GPU to gain perfect response time. We design an online scheduling algorithm to determine whether CPU or GPU processes the query faster. Regression analysis on a huge number of experimental results concludes a regression formula as the scheduling metric. We perform exhaustive experiments on our new approaches. Experimental results on the TREC Gov and Baidu datasets show that our approaches can improve the performance of the lists intersection significantly.

[1]  Ricardo A. Baeza-Yates,et al.  A Fast Set Intersection Algorithm for Sorted Sequences , 2004, CPM.

[2]  Alejandro López-Ortiz,et al.  Faster Adaptive Set Intersections for Text Searching , 2006, WEA.

[3]  Alistair Moffat,et al.  Load balancing for term-distributed parallel retrieval , 2006, SIGIR.

[4]  Sudipto Guha,et al.  Improving the Performance of List Intersection , 2009, Proc. VLDB Endow..

[5]  Ricardo A. Baeza-Yates,et al.  Experimental Analysis of a Fast Intersection Algorithm for Sorted Sequences , 2005, SPIRE.

[6]  Gang Wang,et al.  A Batched GPU Algorithm for Set Intersection , 2009, 2009 10th International Symposium on Pervasive Systems, Algorithms, and Networks.

[7]  Stephen E. Robertson,et al.  Parallel search using partitioned inverted files , 2000, Proceedings Seventh International Symposium on String Processing and Information Retrieval. SPIRE 2000.

[8]  James Demmel,et al.  Benchmarking GPUs to tune dense linear algebra , 2008, 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis.

[9]  Francisco Tirado,et al.  Improving Search Engines Performance on Multithreading Processors , 2008, VECPAR.

[10]  Erik D. Demaine,et al.  Adaptive set intersections, unions, and differences , 2000, SODA '00.

[11]  Frank K. Hwang,et al.  Optimal merging of 2 elements with n elements , 2004, Acta Informatica.

[12]  Shirish Tatikonda,et al.  On efficient posting list intersection with multicore processors , 2009, SIGIR.

[13]  Jinghai Li,et al.  Multi-scale HPC system for multi-scale discrete simulation—Development and application of a supercomputer with 1 Petaflops peak performance in single precision , 2009 .

[14]  Claire Mathieu,et al.  Adaptive intersection and t-threshold problems , 2002, SODA '02.

[15]  H. Robinson Principles and Procedures of Statistics , 1961 .

[16]  Torsten Suel,et al.  Using graphics processors for high performance IR query processing , 2009, WWW.

[17]  Frank K. Hwang,et al.  A Simple Algorithm for Merging Two Disjoint Linearly-Ordered Sets , 1972, SIAM J. Comput..