On the Parallel Implementation of Quasi-Monte Carlo Algorithms

The quasi-Monte Carlo algorithms utilize deterministic low-discrepancy sequences in order to increase the rate of convergence of stochastic simulation algorithms. Such kinds of algorithms are widely applicable and consume large share of the computational time on advanced HPC systems. The recent advances in HPC are increasingly rely on the use of accelerators and other similar devices that improve the energy efficiency and offer better performance for certain type of computations. The Xeon Phi coprocessors combine efficient vector floating point computations with familiar operational and development environment. One potentially difficult part of the conversion of a Monte Carlo algorithm into a quasi-Monte Carlo one is the generation of the low-discrepancy sequences. On such specialized equipment as the Xeon Phi, the value of memory increases due to the presence of a large number of computational cores. In order to allow quasi-Monte Carlo algorithms to make use of hybrid OpenMP+MPI programming, we implemented generation routines that save both memory space and memory bandwidth, with the aim to widen the applicability of quasi-Monte Carlo algorithms in environments with an extremely large number of computational elements. We present our implementation and compare it with regular Monte Carlo using a popular pseudorandom number generator, demonstrating the applicability and advantages of our approach.