Discrete range searching primitive for the GPU and its applications

Graphics processing units (GPUs) provide large computational power at a very low price, which position GPUs well as an ubiquitous accelerator. However, GPUs are space constrained, and hence applications developed for GPUs are space sensitive. Space-constrained computational devices such as GPUs can greatly benefit from representations that reduce space consumption drastically. One such representation is the succinct representation of trees. Succinct representation of trees generally allows for operations such as parent queries, least common ancestor queries, and so on. Mapping such a robust representation to the GPU for targeted applications can lead to substantial improvement in problem sizes that are processed at a given point of time. Space-saving methods such as succinct data structures remain largely unexplored on the GPU. In this work, a succinct representation of ordered trees on the GPU is explored, with application to discrete range searching (DRS). Based on the succinct representations found applicable, a space--saving solution for DRS is presented here. In our method, DRS is mapped to a least common ancestor query on a Cartesian tree. For space-efficient DRS queries, we store the succinct representation of the Cartesian tree of an array. Our method uses a maximum of 7.5 bits of additional space per element. Furthermore, the speed-up achieved by our method is in the range of 20--25 for preprocessing and 25--35 for batch querying over a sequential implementation. Compared to an 8-threaded implementation, our preprocessing and querying methods obtain a speed-up of 6--8. We also study the applications of the DRS on the GPU. Efficient primitives expand the range of applications performed on the GPU. DRS is one such primitive with direct applications to string processing, document and text retrieval systems, and least common ancestor queries. We suggest that graph algorithms that use the least common ancestor, can be enabled on the GPU based on DRS primitive. We also show some applications of DRS in tree queries and string querying.

[1]  Volker Heun,et al.  A New Succinct Representation of RMQ-Information and Improvements in the Enhanced Suffix Array , 2007, ESCAPE.

[2]  Joseph JáJá,et al.  An Introduction to Parallel Algorithms , 1992 .

[3]  S. Sitharama Iyengar,et al.  Introduction to parallel algorithms , 1998, Wiley series on parallel and distributed computing.

[4]  Kunihiko Sadakane,et al.  Compressed Suffix Trees with Full Functionality , 2007, Theory of Computing Systems.

[5]  Michael Chu,et al.  Scientific and Engineering Computing Using ATI Stream Technology , 2009, Computing in Science & Engineering.

[6]  Michael Garland,et al.  Designing efficient sorting algorithms for manycore GPUs , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.

[7]  Jon Louis Bentley,et al.  Data Structures for Range Searching , 1979, CSUR.

[8]  Robert E. Tarjan,et al.  Scaling and related techniques for geometry problems , 1984, STOC '84.

[9]  John E. Stone,et al.  OpenCL: A Parallel Programming Standard for Heterogeneous Computing Systems , 2010, Computing in Science & Engineering.

[10]  Kunihiko Sadakane,et al.  Ultra-succinct representation of ordered trees , 2007, SODA '07.

[11]  Philippas Tsigas,et al.  A Practical Quicksort Algorithm for Graphics Processors , 2008, ESA.

[12]  Uzi Vishkin,et al.  Recursive *-tree parallel data-structure , 1989, 30th Annual Symposium on Foundations of Computer Science.

[13]  Justin Hensley,et al.  Efficient histogram generation using scattering on GPUs , 2007, SI3D.

[14]  Volker Heun,et al.  Theoretical and Practical Improvements on the RMQ-Problem, with Applications to LCA and LCE , 2006, CPM.

[15]  Ulf Assarsson,et al.  Fast parallel GPU-sorting using a hybrid algorithm , 2008, J. Parallel Distributed Comput..

[16]  Kiran Kumar Matam,et al.  Efficient Discrete Range Searching primitives on the GPU with applications , 2010, 2010 International Conference on High Performance Computing.

[17]  P. J. Narayanan,et al.  Scalable Split and Gather Primitives for the GPU , 2009 .

[18]  Samuel Williams,et al.  The Landscape of Parallel Computing Research: A View from Berkeley , 2006 .

[19]  Yao Zhang,et al.  Scan primitives for GPU computing , 2007, GH '07.

[20]  Steven Skiena,et al.  Finding least common ancestors in directed acyclic graphs , 2001, SODA '01.

[21]  Volker Heun,et al.  Practical Entropy-Bounded Schemes for O(1)-Range Minimum Queries , 2008, Data Compression Conference (dcc 2008).

[22]  P. J. Narayanan,et al.  Fast and scalable list ranking on the GPU , 2009, ICS.