Parallel Rendering on Hybrid Multi-GPU Clusters

Achieving efficient scalable parallel rendering for interactive visualization applications on medium-sized graphics clusters remains a challenging problem. Framerates of up to 60hz require a carefully designed and fine-tuned parallel rendering implementation that fits all required operations into the 16ms time budget available for each rendered frame. Furthermore, modern commodity hardware embraces more and more a NUMA architecture, where multiple processor sockets each have their locally attached memory and where auxiliary devices such as GPUs and network interfaces are directly attached to one of the processors. Such so called fat NUMA processing and graphics nodes are increasingly used to build cost-effective hybrid shared/distributed memory visualization clusters. In this paper we present a thorough analysis of the asynchronous parallelization of the rendering stages and we derive and implement important optimizations to achieve highly interactive framerates on such hybrid multi-GPU clusters. We use both a benchmark program and a real-world scientific application used to visualize, navigate and interact with simulations of cortical neuron circuit models.

[1]  Jeffrey S. Vetter,et al.  Quantifying NUMA and contention effects in multi-GPU systems , 2011, GPGPU-4.

[2]  H. Markram The Blue Brain Project , 2006, Nature Reviews Neuroscience.

[3]  Gerrit Voss,et al.  A multi-thread safe foundation for scene graphs and its extension to clusters , 2002, EGPGV.

[4]  Robert B. Ross,et al.  A configurable algorithm for parallel image-compositing applications , 2009, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis.

[5]  Greg Humphreys,et al.  Chromium: a stream-processing framework for interactive rendering on clusters , 2002, SIGGRAPH.

[6]  Tadao Nakamura,et al.  Differential coding scheme for efficient parallel image composition on a PC cluster system , 2004, Parallel Comput..

[7]  Kwan-Liu Ma,et al.  Massively parallel volume rendering using 2-3 swap image compositing , 2008, HiPC 2008.

[8]  Kwan-Liu Ma,et al.  Parallel Volume Rendering Using Binary-Swap Image Composition , 2008, International Conference on Computer Graphics and Interactive Techniques.

[9]  Thomas A. Funkhouser,et al.  Load balancing for multi-projector rendering systems , 1999, Workshop on Graphics Hardware.

[10]  Kwan-Liu Ma,et al.  SLIC: scheduled linear image compositing for parallel volume rendering , 2003, IEEE Symposium on Parallel and Large-Data Visualization and Graphics, 2003. PVG 2003..

[11]  J. Ahrens,et al.  Efficient Sort-Last Rendering Using Compression-Based Image Compositing , 1998 .

[12]  Waldemar Celes Filho,et al.  A load-balancing strategy for sort-first distributed rendering , 2004, Proceedings. 17th Brazilian Symposium on Computer Graphics and Image Processing.

[13]  Cauligi S. Raghavendra,et al.  Image Composition Schemes for Sort-Last Polygon Rendering on 2D Mesh Multicomputers , 1996, IEEE Trans. Vis. Comput. Graph..

[14]  Henry Fuchs,et al.  A sorting classification of parallel rendering , 1994, IEEE Computer Graphics and Applications.

[15]  W. Rall Theory of Physiological Properties of Dendrites , 1962, Annals of the New York Academy of Sciences.

[16]  Fumihiko Ino,et al.  An improved binary-swap compositing for sort-last parallel rendering on distributed memory multiprocessors , 2003, Parallel Comput..

[17]  Jung Hong Chuang Level of Detail for 3D Graphics , 2002 .

[18]  Renato Pajarola,et al.  Fast Compositing for Cluster-Parallel Rendering , 2010, EGPGV@Eurographics.

[19]  Henry Markram,et al.  A Neuron Membrane Mesh Representation for Visualization of Electrophysiological Simulations , 2012, IEEE Transactions on Visualization and Computer Graphics.

[20]  Renato Pajarola,et al.  Eurographics Symposium on Parallel Graphics and Visualization (2007) Direct Send Compositing for Parallel Sort-last Rendering , 2022 .

[21]  Falko Kuester,et al.  CGLX: A Scalable, High-Performance Visualization Framework for Networked Display Environments , 2011, IEEE Transactions on Visualization and Computer Graphics.

[22]  Stefan Eilemann,et al.  OpenGL multipipe SDK: a toolkit for scalable parallel rendering , 2005, VIS 05. IEEE Visualization, 2005..

[23]  Renato Pajarola,et al.  Equalizer: A Scalable Parallel Rendering Framework , 2008, IEEE Transactions on Visualization and Computer Graphics.

[24]  Carolina Cruz-Neira,et al.  VR Juggler: a virtual platform for virtual reality application development , 2001, Proceedings IEEE Virtual Reality 2001.

[25]  Henry Markram,et al.  Models of Neocortical Layer 5b Pyramidal Cells Capturing a Wide Range of Dendritic and Perisomatic Active Properties , 2011, PLoS Comput. Biol..