论文信息 - NUMA-aware image compositing on multi-GPU platform

NUMA-aware image compositing on multi-GPU platform

Sort-last parallel rendering is widely used. Recent GPU developments mean that a PC equipped with multiple GPUs is a viable alternative to a high-cost supercomputer: the Fermi architecture of a single GPU supports uniform virtual addressing, providing a foundation for non-uniform memory access (NUMA) on multi-GPU platforms. Such hardware changes require the user to reconsider the parallel rendering algorithms. In this paper, we thoroughly investigate the NUMA-aware image compositing problem, which is the key final stage in sort-last parallel rendering. Based on a proven radix-k strategy, we find one optimal compositing algorithm, which takes advantage of NUMA architecture on the multi-GPU platform. We quantitatively analyze different image compositing modes for practical image compositing, taking into account peer-to-peer communication costs between GPUs. Our experiments on various datasets show that our image compositing method is very fast, an image of a few megapixels can be composited in about 10 ms by eight GPUs.

[1] Tom Duff,et al. Compositing digital images , 1984, SIGGRAPH.

[2] Wilfred Pinfold,et al. Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis , 2009, HiPC 2009.

[3] Kwan-Liu Ma,et al. Parallel volume rendering using binary-swap compositing , 1994, IEEE Computer Graphics and Applications.

[4] Ulrich Neumann. Communication costs for parallel volume-rendering algorithms , 1994, IEEE Computer Graphics and Applications.

[5] Jean-Michel Dischler,et al. Multi-GPU Sort-Last Volume Visualization , 2008, EGPGV@Eurographics.

[6] Henry Fuchs,et al. A sorting classification of parallel rendering , 2008, SIGGRAPH 2008.

[7] Christophe Mion,et al. COTS cluster-based sort-last rendering: performance evaluation and pipelined implementation , 2005, VIS 05. IEEE Visualization, 2005..

[8] John D. Owens,et al. Distributed texture memory in a multi-GPU environment , 2006, GH '06.

[9] Robert B. Ross,et al. A configurable algorithm for parallel image-compositing applications , 2009, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis.

[10] Robert A. van de Geijn,et al. Collective communication: theory, practice, and experience: Research Articles , 2007 .

[11] Xavier Cavin,et al. Shift-Based Parallel Image Compositing on InfiniBandTM Fat-Trees , 2012, EGPGV@Eurographics.

[12] Renato Pajarola,et al. Parallel Rendering on Hybrid Multi-GPU Clusters , 2012, EGPGV@Eurographics.

[13] Jian Huang,et al. An image compositing solution at scale , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[14] Jeffrey S. Vetter,et al. Quantifying NUMA and contention effects in multi-GPU systems , 2011, GPGPU-4.

[15] Scott Lathrop,et al. Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis , 2011, International Conference on High Performance Computing.

[16] Kwan-Liu Ma,et al. SLIC: scheduled linear image compositing for parallel volume rendering , 2003, IEEE Symposium on Parallel and Large-Data Visualization and Graphics, 2003. PVG 2003..

[17] 7th Eurographics Symposium on Parallel Graphics and Visualization, EGPGV 2007, Lugano, Switzerland, May 20-21, 2007 , 2007, Eurographics Symposium on Parallel Graphics and Visualization.

[18] N. England,et al. Graphics Hardware , 2019, IEEE Computer Graphics and Applications.

[19] Thomas W. Crockett,et al. PARALLEL RENDERING , 1995 .

[20] Robert B. Ross,et al. Accelerating and Benchmarking Radix-k Image Compositing at Large Scale , 2010, EGPGV@Eurographics.

[21] Renato Pajarola,et al. Eurographics Symposium on Parallel Graphics and Visualization (2007) Direct Send Compositing for Parallel Sort-last Rendering , 2022 .

[22] Robert A. van de Geijn,et al. Collective communication: theory, practice, and experience , 2007, Concurr. Comput. Pract. Exp..