Distributed processor Monte Carlo: MCNP results on a 16-node IBM cluster

The advent of high-performance computer systems has brought to maturity programming concepts like vectorization, multiprocessing, and multitasking. Although there are many schools of thought as to the most significant factor in obtaining order-of-magnitude increases in performance, such speedup can only be achieved by integrating the computer system and application code. Vectorization leads to faster manipulation of arrays by overlapping instruction CPU cycles. Discrete ordinates codes, which require the solving of large matrices, have proved to be major benefactors of vectorization. Monte Carlo transport, on the other hand, typically contains numerous logic statements and requires extensive redevelopment to benefit from vectorization. Multiprocessing and multitasking provide additional CPU cycles via multiple processors. Such systems are generally designed with either common memory access or distributed memory access. In both cases, theoretical speedup, as a function of the number of processors (P) and the fraction of task time that multiprocesses (f), can be formulated using Amdahl`s Law S(f, P) = 1/(1 {minus} f + f /P). However, for most applications this theoretical limit cannot be achieved, due to additional terms not included in Amdahl`s Law. Monte Carlo transport is a natural candidate for multiprocessing, since the particle tracks are generally independent and the precisionmore » of the result increases as the square root of the number of particles tracked.« less