TPC-H Benchmark Q3, Q6 and Q12 Sequential, OpenMP Parallel and CUDA Parallel Implementation

In the last decade or two, parallel computing has solved a myriad of problems when it comes to speeding up the problem-solving process. To deliver the necessary acceleration of their product, application developers have turned to CUDA-powered GPU parallel processing. One of the many uses of the well-known NVIDIA parallel computing platform CUDA is to enhance the performance of problems whose root comes from speeding up database queries. The fundamental focus of this paper is evaluating the performance of different parallelizing techniques using the TPC-H benchmark. This is achieved by parallelizing the ‘SELECT’ queries that the benchmark offers using the Open Multi-Processing (OpenMP) API (Q3, Q6, and Q12 in particular), then parallel on the GPU using the previously mentioned CUDA platform. Lastly, comparing the GPU's performance with both sequential instruction execution and parallel instruction execution on a CPU. This implementation resulted in speedup ratios ranging from 2 to 4 when analyzing the CPU-parallel code and ranging from 10 up to 558 when analyzing the GPU - parallel code.