Parallel H-Tree Based Data Cubing on Graphics Processors

Graphics processing units (GPUs) have a SIMD architecture and have been widely used recently as powerful generalpurpose co-processors for the CPU. In this paper, we investigate efficient GPU-based data cubing because the most frequent operation in data cube computation is aggregation, which is an expensive operation well suited for SIMD parallel processors. H-tree is a hyper-linked tree structure used in both top-k H-cubing [21] and the stream cube [20]. Fast H-tree construction, update and real-time query response are crucial in many OLAP applications. We design highly efficient GPU-based parallel algorithms for these H-tree based data cube operations. This has been made possible by taking effective methods, such as parallel primitives for segmented data and efficient memory access patterns, to achieve load balance on the GPU while hiding memory access latency. As a result, our GPU algorithms can often achieve more than an order of magnitude speedup when compared with their sequential counterparts on a single CPU. To the best of our knowledge, this is the first attempt to develop parallel data cubing algorithms on graphics processors.

[1]  Ramakrishnan Srikant,et al.  Mining sequential patterns , 1995, Proceedings of the Eleventh International Conference on Data Engineering.

[2]  Hanan Samet,et al.  A Fast Similarity Join Algorithm Using Graphics Processing Units , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[3]  Kait Kasak A Practical Quicksort Algorithm for Graphics Processors , 2011 .

[4]  John D. Owens,et al.  A Work-Efficient Step-Efficient Prefix Sum Algorithm , 2006 .

[5]  Hamid Pirahesh,et al.  Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals , 1996, Data Mining and Knowledge Discovery.

[6]  Susanne E. Hambrusch,et al.  Parallelizing the Data Cube , 2001, ICDT.

[7]  Dinesh Manocha,et al.  GPUTeraSort: high performance graphics co-processor sorting for large database management , 2006, SIGMOD Conference.

[8]  Dinesh Manocha,et al.  Fast and approximate stream mining of quantiles and frequencies using graphics processors , 2005, SIGMOD '05.

[9]  Bingsheng He,et al.  Efficient gather and scatter operations on graphics processors , 2007, Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '07).

[10]  Guy E. Blelloch,et al.  Radix sort for vector multiprocessors , 1991, Proceedings of the 1991 ACM/IEEE Conference on Supercomputing (Supercomputing '91).

[11]  Nick Roussopoulos,et al.  Cubetree: organization of and bulk incremental updates on the data cube , 1997, SIGMOD '97.

[12]  Divyakant Agrawal,et al.  Hardware acceleration for spatial selections and joins , 2003, SIGMOD '03.

[13]  Alok N. Choudhary,et al.  High Performance OLAP and Data Mining on Parallel Computers , 1997, Data Mining and Knowledge Discovery.

[14]  Nimrod Megiddo,et al.  Discovery-Driven Exploration of OLAP Data Cubes , 1998, EDBT.

[15]  Sweta Singh,et al.  An Overview of Data Warehousing and OLAP Technology , 2014 .

[16]  Bingsheng He,et al.  Relational joins on graphics processors , 2008, SIGMOD Conference.

[17]  Yixin Chen,et al.  Stream Cube: An Architecture for Multi-Dimensional Analysis of Data Streams , 2005, Distributed and Parallel Databases.

[18]  Andrew Rau-Chaplin,et al.  The cgmCUBE project: Optimizing parallel data cube generation for ROLAP , 2006, Distributed and Parallel Databases.

[19]  David Blythe The Direct3D 10 system , 2006, ACM Trans. Graph..

[20]  Dinesh Manocha,et al.  Fast computation of database operations using graphics processors , 2004, SIGMOD '04.

[21]  Andrew Rau-Chaplin,et al.  Computing Partial Data Cubes for Parallel Data Warehousing Applications , 2001, PVM/MPI.

[22]  Hongjun Lu,et al.  Cubes Using Massively Parallel Processors , 1997 .

[23]  Divyakant Agrawal,et al.  Hardware Acceleration in Commercial Databases: A Case Study of Spatial Operations , 2004, VLDB.

[24]  Hans-Peter Seidel,et al.  EUROGRAPHICS 2007 / D. Cohen-Or and P. Slavík (Guest Editors) Stackless KD-Tree Traversal for High Performance GPU Ray Tracing , 2022 .

[25]  Jian Pei,et al.  Efficient computation of Iceberg cubes with complex measures , 2001, SIGMOD '01.

[26]  Jian Pei,et al.  Mining frequent patterns without candidate generation , 2000, SIGMOD '00.

[27]  Ying Chen,et al.  Parallel ROLAP Data Cube Construction on Shared-Nothing Multiprocessors , 2003, Proceedings International Parallel and Distributed Processing Symposium.

[28]  Jens H. Krüger,et al.  A Survey of General‐Purpose Computation on Graphics Hardware , 2007, Eurographics.

[29]  Yao Zhang,et al.  Scan primitives for GPU computing , 2007, GH '07.

[30]  Ruoming Jin,et al.  Communication and memory optimal parallel data cube construction , 2003, 2003 International Conference on Parallel Processing, 2003. Proceedings..

[31]  Kun Zhou,et al.  Real-time KD-tree construction on graphics hardware , 2008, SIGGRAPH 2008.