Graphics processing units(GPUs) are starting to play an increasingly important role in non-graphical applications which are highly parallelisable. With the latest graphics cards boasting a theoretical 165GFlops and 54GB/s memory bandwidth spread across 48 ALUs it is easy to see why. The GPU architecture is particularly suited to the parallel stream processing paradigm of low levels of data dependency, high data to instruction ratio and predictable memory access patterns. One largely ignored, yet key, bottleneck for this type of processing on GPUs is both download and readback transfer performance to and from the graphics card. Existing tools provide great developer assistance in many areas of GPU application development, though provide very limited assistance in gaining the best bi-directional data transfer performance. In this paper, we discuss these limitations and present new investigative tools which allow general purpose processing GPU developers to explore the complex array of configuration states which affect both the download and readback performance.
[1]
Suresh Venkatasubramanian.
The Graphics Card as a Streaming Computer
,
2003,
ArXiv.
[2]
Jens H. Krüger,et al.
A Survey of General‐Purpose Computation on Graphics Hardware
,
2007,
Eurographics.
[3]
Dinesh Manocha,et al.
Fast and approximate stream mining of quantiles and frequencies using graphics processors
,
2005,
SIGMOD '05.
[4]
Tom Davis,et al.
Opengl programming guide: the official guide to learning opengl
,
1993
.
[5]
Suresh Venkatasubramanian.
The Graphics Card as a Stream Computer
,
2003
.
[6]
Steve Mann,et al.
OpenVIDIA: parallel GPU computer vision
,
2005,
ACM Multimedia.
[7]
Dinesh Manocha,et al.
GPUTeraSort: high performance graphics co-processor sorting for large database management
,
2006,
SIGMOD Conference.