NUMA-aware image compositing on multi-GPU platform

Sort-last parallel rendering is widely used. Recent GPU developments mean that a PC equipped with multiple GPUs is a viable alternative to a high-cost supercomputer: the Fermi architecture of a single GPU supports uniform virtual addressing, providing a foundation for non-uniform memory access (NUMA) on multi-GPU platforms. Such hardware changes require the user to reconsider the parallel rendering algorithms. In this paper, we thoroughly investigate the NUMA-aware image compositing problem, which is the key final stage in sort-last parallel rendering. Based on a proven radix-k strategy, we find one optimal compositing algorithm, which takes advantage of NUMA architecture on the multi-GPU platform. We quantitatively analyze different image compositing modes for practical image compositing, taking into account peer-to-peer communication costs between GPUs. Our experiments on various datasets show that our image compositing method is very fast, an image of a few megapixels can be composited in about 10 ms by eight GPUs.

[1]  Tom Duff,et al.  Compositing digital images , 1984, SIGGRAPH.

[2]  Wilfred Pinfold,et al.  Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis , 2009, HiPC 2009.

[3]  Kwan-Liu Ma,et al.  Parallel volume rendering using binary-swap compositing , 1994, IEEE Computer Graphics and Applications.

[4]  Ulrich Neumann Communication costs for parallel volume-rendering algorithms , 1994, IEEE Computer Graphics and Applications.

[5]  Jean-Michel Dischler,et al.  Multi-GPU Sort-Last Volume Visualization , 2008, EGPGV@Eurographics.

[6]  Henry Fuchs,et al.  A sorting classification of parallel rendering , 2008, SIGGRAPH 2008.

[7]  Christophe Mion,et al.  COTS cluster-based sort-last rendering: performance evaluation and pipelined implementation , 2005, VIS 05. IEEE Visualization, 2005..

[8]  John D. Owens,et al.  Distributed texture memory in a multi-GPU environment , 2006, GH '06.

[9]  Robert B. Ross,et al.  A configurable algorithm for parallel image-compositing applications , 2009, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis.

[10]  Robert A. van de Geijn,et al.  Collective communication: theory, practice, and experience: Research Articles , 2007 .

[11]  Xavier Cavin,et al.  Shift-Based Parallel Image Compositing on InfiniBandTM Fat-Trees , 2012, EGPGV@Eurographics.

[12]  Renato Pajarola,et al.  Parallel Rendering on Hybrid Multi-GPU Clusters , 2012, EGPGV@Eurographics.

[13]  Jian Huang,et al.  An image compositing solution at scale , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[14]  Jeffrey S. Vetter,et al.  Quantifying NUMA and contention effects in multi-GPU systems , 2011, GPGPU-4.

[15]  Scott Lathrop,et al.  Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis , 2011, International Conference on High Performance Computing.

[16]  Kwan-Liu Ma,et al.  SLIC: scheduled linear image compositing for parallel volume rendering , 2003, IEEE Symposium on Parallel and Large-Data Visualization and Graphics, 2003. PVG 2003..

[17]  7th Eurographics Symposium on Parallel Graphics and Visualization, EGPGV 2007, Lugano, Switzerland, May 20-21, 2007 , 2007, Eurographics Symposium on Parallel Graphics and Visualization.

[18]  N. England,et al.  Graphics Hardware , 2019, IEEE Computer Graphics and Applications.

[19]  Thomas W. Crockett,et al.  PARALLEL RENDERING , 1995 .

[20]  Robert B. Ross,et al.  Accelerating and Benchmarking Radix-k Image Compositing at Large Scale , 2010, EGPGV@Eurographics.

[21]  Renato Pajarola,et al.  Eurographics Symposium on Parallel Graphics and Visualization (2007) Direct Send Compositing for Parallel Sort-last Rendering , 2022 .

[22]  Robert A. van de Geijn,et al.  Collective communication: theory, practice, and experience , 2007, Concurr. Comput. Pract. Exp..