A Highly Efficient Implementation of I/O Functions on GPU

The API interfaces provided by CUDA can help programmers develop CUDA applications and get high performance in GPU. However, many of the I/O operations are not supported in device codes. This paper has implemented most of the I/O functions through host's agent by using the characteristics of mapped memory in CUDA, such as read/write file and 'printf'. The methods that used to implement these I/O functions will not affect the performance of original applications, users' I/O requirements can be responded quickly, even more, the performance of 'printf' implemented in this paper is higher than that provided by CUDA. This paper supports easy and effective real-time debug method to GPU users, the research in this paper can improve productivity of converting legacy C/C++ codes to CUDA codes, and it is a valuable investigation for broadening CUDA's functions.

[1]  Jie Cheng,et al.  CUDA by Example: An Introduction to General-Purpose GPU Programming , 2010, Scalable Comput. Pract. Exp..

[2]  Jie Cheng,et al.  Programming Massively Parallel Processors. A Hands-on Approach , 2010, Scalable Comput. Pract. Exp..

[3]  Thomas Ertl,et al.  A Compute Unified System Architecture for Graphics Clusters Incorporating Data Locality , 2009, IEEE Transactions on Visualization and Computer Graphics.

[4]  Hiroaki Kobayashi,et al.  CheCUDA: A Checkpoint/Restart Tool for CUDA Applications , 2009, 2009 International Conference on Parallel and Distributed Computing, Applications and Technologies.

[5]  Sudhakar Yalamanchili,et al.  Ocelot: A dynamic optimization framework for bulk-synchronous applications in heterogeneous systems , 2010, 2010 19th International Conference on Parallel Architectures and Compilation Techniques (PACT).