Software/Hardware Framework for Generating Parallel Long-Period Random Numbers Using the WELL Method

The Well Equidistributed Long-period Linear (WELL) algorithm is proven to have better characteristics than the Mersenne Twister (MT), one of the most widely used long-period pseudo-random number generators (PRNGs). In this paper, we propose a hardware architecture for efficient implementation of WELL. Our design achieves a throughput of 1 sample-per-cycle and runs as fast as 449.4 MHz on a Xilinx XC6VLX240T FPGA. This performance is 7.6-fold faster than a dedicated software implementation, and is comparable to a MT hardware generator built on the same device. It takes up 633 LUTs, 537 Flip-Flops and 4 BRAMs, which is only 0.5% of the device. Furthermore, we design a software/hardware framework that is capable of dividing the WELL stream into an arbitrary number of independent parallel sub-streams. With support from software, this framework can obtain speedup roughly proportional to the number of parallel cores. The quality of the random numbers generated by our design is verified by the standard statistical test suites Diehard and TestU01. We also apply our framework to a Monte-Carlo simulation for estimating p. Experimental results verify the correctness of our framework as well as the better characteristics of the WELL algorithm.