论文信息 - Efficient Parallel Lists Intersection and Index Compression Algorithms using Graphics Processing Units

Efficient Parallel Lists Intersection and Index Compression Algorithms using Graphics Processing Units

Major web search engines answer thousands of queries per second requesting information about billions of web pages. The data sizes and query loads are growing at an exponential rate. To manage the heavy workload, we consider techniques for utilizing a Graphics Processing Unit (GPU). We investigate new approaches to improve two important operations of search engines -- lists intersection and index compression. For lists intersection, we develop techniques for efficient implementation of the binary search algorithm for parallel computation. We inspect some representative real-world datasets and find that a sufficiently long inverted list has an overall linear rate of increase. Based on this observation, we propose Linear Regression and Hash Segmentation techniques for contracting the search range. For index compression, the traditional d-gap based compression schemata are not well-suited for parallel computation, so we propose a Linear Regression Compression schema which has an inherent parallel structure. We further discuss how to efficiently intersect the compressed lists on a GPU. Our experimental results show significant improvements in the query processing throughput on several datasets.

[1] Gang Wang,et al. Efficient lists intersection by CPU-GPU cooperative computing , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW).

[2] Derick Wood,et al. A survey of adaptive sorting algorithms , 1992, CSUR.

[3] Alon Itai,et al. Interpolation search—a log logN search , 1978, CACM.

[4] Vipin Kumar,et al. Isoefficiency: measuring the scalability of parallel algorithms and architectures , 1993, IEEE Parallel & Distributed Technology: Systems & Applications.

[5] Erik D. Demaine,et al. Adaptive set intersections, unions, and differences , 2000, SODA '00.

[6] Torsten Suel,et al. Inverted index compression and query processing with optimized document ordering , 2009, WWW '09.

[7] Sudipto Guha,et al. Improving the Performance of List Intersection , 2009, Proc. VLDB Endow..

[8] Ian H. Witten,et al. Managing Gigabytes: Compressing and Indexing Documents and Images , 1999 .

[9] Yao Zhang,et al. Scan primitives for GPU computing , 2007, GH '07.

[10] S. Héman. Super-Scalar Database Compression between RAM and CPU Cache , 2005 .

[11] Ricardo A. Baeza-Yates,et al. Experimental Analysis of a Fast Intersection Algorithm for Sorted Sequences , 2005, SPIRE.

[12] Marcin Zukowski,et al. Super-Scalar RAM-CPU Cache Compression , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[13] J. Wishart. Statistical tables , 2018, Global Education Monitoring Report.