Comparison via parallel performance models of angular and spatial domain decompositions for solving neutral particle transport problems

Abstract A previously reported parallel performance model for angular domain decomposition (ADD) of the discrete ordinates approximation for solving multidimensional neutral particle transport problems is revisited for stronger validation. Three communication schemes, native MPI, the bucket algorithm, and the distributed bucket algorithm, are included in the validation exercise that is successfully conducted on a Beowulf cluster. The parallel component of the parallel performance model is largely independent of the communication scheme, in contrast with the communication component that is strongly dependent on the global reduce algorithm. Correct trends for each component and each communication scheme are measured for the Arbitrarily High Order Transport (AHOT) code, thus validating the performance models. Furthermore, extensive experiments illustrate the superiority of the bucket algorithm, in the sense that it incurs a smaller communication penalty compared to the native MPI and distributed bucket algorithms. The primary question addressed in this work is for a given problem size, which domain decomposition scheme, angular or spatial, is best suited to parallelize discrete ordinates methods on a specific computational platform? We address this question for three-dimensional applications via parallel performance models for the abovementioned ADD, and a previously constructed and validated spatial domain decomposition (SDD) model. The constructed parallel performance models include parameters specifying the problem size and system performance. We conclude that for large problems the parallel component dwarfs the communication component even on moderately large numbers of processors. The main advantages of SDD are (a) scalability to higher numbers of processors of the order of the number of computational cells; (b) smaller memory requirement; (c) better performance than ADD on high-end platforms and large number of processors. On the other hand, the main advantages of ADD are (a) perfect load balance; (b) simple implementation, even on unstructured grids; (c) better performance than SDD on medium- and low-end platforms and large number of discrete ordinates. It follows that programmers and users of discrete ordinates codes must carefully select the appropriate domain decomposition method for the class of problems and multiprocessor platforms they wish to target.

[1]  Ii A.R. Larzelere,et al.  Creating simulation capabilities , 1998 .

[2]  Yousry Y. Azmy On the adequacy of message-passing parallel supercomputers for solving neutron transport problems , 1990, Proceedings SUPERCOMPUTING '90.

[3]  B. R. Wienke,et al.  Parallel S /sub n/ iteration schemes , 1985 .

[4]  Adolfy Hoisie,et al.  Performance and Scalability Analysis of Teraflop-Scale Parallel Architectures Using Multidimensional Wavefront Applications , 2000, Int. J. High Perform. Comput. Appl..

[5]  Y. Y. Azmy,et al.  Multiprocessing for neutron diffusion and deterministic transport methods , 1997 .

[6]  Shawn D. Pautz,et al.  An Algorithm for Parallel Sn Sweeps on Unstructured Meshes , 2001 .

[7]  Adolfy Hoisie,et al.  Scalability analysis of multidimensional wavefront algorithms on large-scale SMP clusters , 1999, Proceedings. Frontiers '99. Seventh Symposium on the Frontiers of Massively Parallel Computation.

[8]  Adolfy Hoisie,et al.  Exploring advanced architectures using performance prediction , 2002, International Workshop on Innovative Architecture for Future Generation High-Performance Processors and Systems.

[9]  Fabrizio Petrini,et al.  A general predictive performance model for wavefront algorithms on clusters of SMPs , 2000, Proceedings 2000 International Conference on Parallel Processing.

[10]  Steven J. Plimpton,et al.  Parallel Algorithms for Radiation Transport on Unstructured Grids , 2000, ACM/IEEE SC 2000 Conference (SC'00).

[11]  Fabrizio Petrini,et al.  Predictive Performance and Scalability Modeling of a Large-Scale Application , 2001, ACM/IEEE SC 2001 Conference (SC'01).

[12]  Y. Y. Azmy Communication strategies for angular domain decomposition of transport calculations on message passing multiprocessors , 1997 .