Algorithmic and language-based optimization of Marsa-LFIB4 pseudorandom number generator using OpenMP, OpenACC and CUDA

Abstract The aim of this paper is to present new high-performance implementations of Marsa-LFIB4 which is an example of high-quality multiple recursive pseudorandom number generators. We propose an algorithmic approach that combines language-based vectorization techniques together with a new divide-and-conquer parallel method that exploits a special sparse structure of the matrix obtained from the recursive formula that defines the generator. Our portable OpenACC implementation achieves the performance comparable to those achieved by our CUDA-based and OpenMP-based implementations on GPUs and multicore CPUs, respectively.

[1]  Rob H. Bisseling,et al.  Parallel scientific computation - a structured approach using BSP and MPI , 2004 .

[2]  Michael Mascagni,et al.  Parameterizing parallel multiplicative lagged-Fibonacci generators , 2004, Parallel Comput..

[3]  Pierre L'Ecuyer,et al.  Good Parameters and Implementations for Combined Multiple Recursive Random Number Generators , 1999, Oper. Res..

[4]  Ora E. Percus,et al.  Random Number Generators for MIMD Parallel Processors , 1989, J. Parallel Distributed Comput..

[5]  Przemyslaw Stpiczynski Parallel Algorithms for Solving Linear Recurrence Systems , 1992, CONPAR.

[6]  Loren Schwiebert,et al.  Improving performance of GPU code using novel features of the NVIDIA kepler architecture , 2016, Concurr. Comput. Pract. Exp..

[7]  James Reinders,et al.  Intel Xeon Phi Coprocessor High Performance Programming , 2013 .

[8]  Donald E. Knuth MMIXware, A RISC Computer for the Third Millennium , 1999, Lecture Notes in Computer Science.

[9]  Michael Mascagni,et al.  SPRNG: A Scalable Library for Pseudorandom Number Generation , 1999, PP.

[10]  Krystian Lapa,et al.  Negative Space-Based Population Initialization Algorithm (NSPIA) , 2018, ICAISC.

[11]  Giray Ökten,et al.  Parameterization based on randomized quasi-Monte Carlo methods , 2008, 2008 IEEE International Symposium on Parallel and Distributed Processing.

[12]  Przemysław Stpiczyński,et al.  Using distributed memory parallel computers and GPU clusters for multidimensional Monte Carlo integration , 2015, Concurr. Comput. Pract. Exp..

[13]  Srinivas Aluru,et al.  Lagged Fibonacci Random Number Generators for Distributed Memory Parallel Computers , 1997, J. Parallel Distributed Comput..

[14]  Stephan Mertens,et al.  Random numbers for large scale distributed Monte Carlo simulations , 2006, Physical review. E, Statistical, nonlinear, and soft matter physics.

[15]  Sunita Chandrasekaran,et al.  OpenACC for Programmers: Concepts and Strategies , 2017 .

[16]  Pierre L'Ecuyer,et al.  TestU01: A C library for empirical testing of random number generators , 2006, TOMS.

[17]  Przemyslaw Stpiczynski Semiautomatic Acceleration of Sparse Matrix-Vector Product Using OpenACC , 2015, PPAM.

[18]  Przemyslaw Stpiczynski Vectorized algorithm for multidimensional Monte Carlo integration on modern GPU, CPU and MIC architectures , 2017, The Journal of Supercomputing.

[19]  Przemyslaw Stpiczynski,et al.  Parallel GPU-accelerated recursion-based generators of pseudorandom numbers , 2012, 2012 Federated Conference on Computer Science and Information Systems (FedCSIS).

[20]  Max Grossman,et al.  Professional CUDA C Programming , 2014 .

[21]  Tapio Niemi,et al.  RAPL in Action , 2018, ACM Trans. Model. Perform. Evaluation Comput. Syst..

[22]  Rob Farber,et al.  Parallel Programming with OpenACC , 2016 .