PACC : An Extension of OpenACC for Pipelined Processing of Large Data on a GPU

We present a suite of directives, named pipelined accelerator (PACC), and its implementation for accelerating large-scale computation on a graphics processing unit (GPU). PACC extends OpenACC to achieve division of large data that cannot be entirely stored in device memory. Given a program with PACC directives, our PACC translator rewrites the program into an OpenACC program such that data is divided into multiple chunks for accelerated execution. Furthermore, the generated program processes chunks in a pipeline so that data transfer between the CPU and GPU can overlap with computation on the GPU. Some preliminary results are also presented to show the impact of PACC in terms of the program execution time and the maximum data size that can be processed successfully.