A Graphical Processing Unit Accelerated NORmal to Anything Algorithm for High Dimensional Multivariate Simulation

Many complex real world systems can be represented as correlated high dimensional vectors (up to 20,501 in this paper). While univariate analysis is simpler, it does not account for correlations between variables. This omission often misleads researchers by producing results based on unrealistic assumptions. As the generation of large correlated data sets is time consuming and resource heavy, we propose a graphical processing unit (GPU) accelerated version of the established NORmal To Anything (NORTA) algorithm. NORTA involves many independent and parallelizeable operations—sparking our interest to deploy a Compute Unified Device Architecture (CUDA) implementation for use on Nvidia GPUs. NORTA begins by simulating independent standard normal vectors and transforms them into correlated vectors with arbitrary marginal distributions (heterogenous random variables). In our benchmark studies using a Tesla Nvidia card, the speedup obtained over a sequential NORTA coded in R (R-NORTA) peaks at 19.6× for 2000 simulated random vectors with dimension 5000. Moreover, the speedup obtained for GPU-NORTA over a commonly used R package for multivariate simulation (the copula package) was 2093× for 2000 simulated random vectors with dimension 20,501. Our study serves as a preliminary proof of concept with opportunities for further optimization, implementation, and additional features.

[1]  T. V. Russkova,et al.  Monte Carlo Simulation of the Solar Radiation Transfer in a Cloudy Atmosphere with the Use of Graphic Processor and NVIDIA CUDA Technology , 2018 .

[2]  G. Strang Introduction to Linear Algebra , 1993 .

[3]  B. Abbasi,et al.  Generating correlation matrices for normal random vectors in NORTA algorithm using artificial neural networks , 2008 .

[4]  Gerhard G Thallinger,et al.  Comparison and evaluation of integrative methods for the analysis of multilevel omics data: a study based on simulated and experimental cancer data , 2019, Briefings Bioinform..

[5]  C. Genest,et al.  The Joy of Copulas: Bivariate Distributions with Uniform Marginals , 1986 .

[6]  Jun Yan,et al.  Enjoy the Joy of Copulas: With a Package copula , 2007 .

[7]  A. Nobel,et al.  Heading Down the Wrong Pathway: on the Influence of Correlation within Gene Sets , 2010, BMC Genomics.

[8]  Giancarlo Mauri,et al.  cuTauLeaping: A GPU-Powered Tau-Leaping Stochastic Simulator for Massive Parallel Analyses of Biological Systems , 2014, PloS one.

[9]  M. Sklar Fonctions de repartition a n dimensions et leurs marges , 1959 .

[10]  Lynne Boddy,et al.  A comparison of Radial Basis Function and backpropagation neural networks for identification of marine phytoplankton from multivariate flow cytometry data , 1994, Comput. Appl. Biosci..

[11]  Maria L. Rizzo,et al.  Statistical Computing with R , 2007 .

[12]  Kaija Saranto,et al.  Definition, structure, content, use and impacts of electronic health records: A review of the research literature , 2008, Int. J. Medical Informatics.

[13]  Jason Sanders,et al.  CUDA by example: an introduction to general purpose GPU programming , 2010 .