Multi-GPU Programming

This chapter covers how to write code that utilizes multiple GPUs. Although there are many possible configurations between host processes and devices one can use in multi-GPU code, this chapter focuses on two configurations: (1) a single host process with multiple GPUs using CUDA’s peer-to-peer capabilities introduced in the 4.0 Toolkit, and (2) using MPI, where each MPI process uses a separate GPU. As an example of each of these approaches, we implement peer-to-peer and MPI multi-GPU versions of the transpose example used in the previous chapter.