A configurable algorithm for parallel image-compositing applications

Collective communication operations can dominate the cost of large-scale parallel algorithms. Image compositing in parallel scientific visualization is a reduction operation where this is the case. We present a new algorithm called Radix-k that in many cases performs better than existing compositing algorithms. It does so through a set of configurable parameters, the radices, that determine the number of communication partners in each message round. The algorithm embodies and unifies binary swap and direct-send, two of the best-known compositing methods, and enables numerous other configurations through appropriate choices of radices. While the algorithm is not tied to a particular computing architecture or network topology, the selection of radices allows Radix-k to take advantage of new supercomputer interconnect features such as multiporting. We show scalability across image size and system size, including both powers of two and nonpowers-of-two process counts.

[1]  Rajeev Thakur,et al.  Optimization of Collective Communication Operations in MPICH , 2005, Int. J. High Perform. Comput. Appl..

[2]  Kwan-Liu Ma,et al.  Parallel volume rendering using binary-swap compositing , 1994, IEEE Computer Graphics and Applications.

[3]  Ulrich Neumann Communication costs for parallel volume-rendering algorithms , 1994, IEEE Computer Graphics and Applications.

[4]  Cauligi S. Raghavendra,et al.  Image Composition Schemes for Sort-Last Polygon Rendering on 2D Mesh Multicomputers , 1996, IEEE Trans. Vis. Comput. Graph..

[5]  Kwan-Liu Ma,et al.  SLIC: scheduled linear image compositing for parallel volume rendering , 2003, IEEE Symposium on Parallel and Large-Data Visualization and Graphics, 2003. PVG 2003..

[6]  Christophe Mion,et al.  COTS cluster-based sort-last rendering: performance evaluation and pipelined implementation , 2005, VIS 05. IEEE Visualization, 2005..

[7]  Jesper Larsson Träff,et al.  A Simple, Pipelined Algorithm for Large, Irregular All-gather Problems , 2008, PVM/MPI.

[8]  William M. Hsu Segmented ray casting for data parallel volume rendering , 1993 .

[9]  Robert A. van de Geijn,et al.  Broadcasting on Meshes with Wormhole Routing , 1996, J. Parallel Distributed Comput..

[10]  J. Ahrens,et al.  Efficient Sort-Last Rendering Using Compression-Based Image Compositing , 1998 .

[11]  Philip Heidelberger,et al.  The deep computing messaging framework: generalized scalable message passing on the blue gene/P supercomputer , 2008, ICS '08.

[12]  Tom Duff,et al.  Compositing digital images , 1984, SIGGRAPH.

[13]  Jehoshua Bruck,et al.  Efficient algorithms for all-to-all communications in multi-port message-passing systems , 1994, SPAA '94.

[14]  Robert A. van de Geijn,et al.  Collective communication on architectures that support simultaneous communication over multiple links , 2006, PPoPP '06.

[15]  William Gropp,et al.  An efficient format for nearly constant-time access to arbitrary time intervals in large trace files , 2008, Sci. Program..

[16]  Robert A. van de Geijn,et al.  Collective communication: theory, practice, and experience: Research Articles , 2007 .

[17]  Jesper Larsson Träff An Improved Algorithm for (Non-commutative) Reduce-Scatter with an Application , 2005, PVM/MPI.

[18]  Ulrich Neumann Parallel volume-rendering algorithm performance on mesh-connected multicomputers , 1993 .

[19]  Kenneth Moreland,et al.  Sort-last parallel rendering for viewing extremely large data sets on tile displays , 2001, Proceedings IEEE 2001 Symposium on Parallel and Large-Data Visualization and Graphics (Cat. No.01EX520).

[20]  Laura Monroe,et al.  NPU-Based Image Compositing in a Distributed Visualization System , 2007, IEEE Transactions on Visualization and Computer Graphics.

[21]  Amith R. Mamidala,et al.  Architecture of the Component Collective Messaging Interface , 2010, Int. J. High Perform. Comput. Appl..

[22]  Kwan-Liu Ma,et al.  Extracting feature lines from 3D unstructured grids , 1997, Proceedings. Visualization '97 (Cat. No. 97CB36155).

[23]  Henry Fuchs,et al.  A sorting classification of parallel rendering , 1994, IEEE Computer Graphics and Applications.

[24]  Jesper Larsson Träff,et al.  More Efficient Reduction Algorithms for Non-Power-of-Two Number of Processors in Message-Passing Parallel Systems , 2004, PVM/MPI.

[25]  Fumihiko Ino,et al.  An improved binary-swap compositing for sort-last parallel rendering on distributed memory multiprocessors , 2003, Parallel Comput..

[26]  Massimo Bernaschi,et al.  Collective communication operations: experimental results vs. theory , 1998, Concurr. Pract. Exp..

[27]  Greg Humphreys,et al.  Chromium: a stream-processing framework for interactive rendering on clusters , 2002, SIGGRAPH.

[28]  J. Watts,et al.  Interprocessor collective communication library (InterCom) , 1994, Proceedings of IEEE Scalable High Performance Computing Conference.