Accelerating Kernel Density Estimation on the GPU Using the CUDA Framework

The main problem of the kernel density estimation methods is the huge computational requirements, especially for large data sets. One way for accelerating these methods is to use the parallel processing. Recent advances in parallel processing have focused on the use Graphics Processing Units (GPUs) using Compute Unied Device Architecture (CUDA) programming model. In this work we discuss a naive and two optimised CUDA algorithms for the two kernel estimation methods: univariate and multivariate. These optimised algorithms are based on the use of shared memory tiles and loop unrolling techniques. We also present exploratory experimental results of the proposed CUDA algorithms according to the several values of parameters such as number of threads per block, tile size, loop unroll level, number of variables and data (sample) size. The experimental results show signicant performance gains of all proposed CUDA algorithms over serial CPU version and small performance speed-ups of the two optimised CUDA algorithms over naive GPU algorithms. Finally, based on extended performance results are obtained general conclusions of all proposed CUDA algorithms for some parameters.

[1]  Axel Werwatz,et al.  Nonparametric Density Estimation , 2004 .

[2]  Matthew P. Wand,et al.  Kernel Smoothing , 1995 .

[3]  Kevin Skadron,et al.  Scalable parallel programming , 2008, 2008 IEEE Hot Chips 20 Symposium (HCS).

[4]  Larry S. Davis,et al.  Efficient Kernel Density Estimation Using the Fast Gauss Transform with Applications to Color Modeling and Tracking , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[5]  Jussi Klemel,et al.  Smoothing of Multivariate Data: Density Estimation and Visualization , 2009 .

[6]  Mancia Anguita,et al.  MPI Toolbox for Octave , 2001 .

[7]  Robert A. Lordo,et al.  Nonparametric and Semiparametric Models , 2005, Technometrics.

[8]  Jie Cheng,et al.  Programming Massively Parallel Processors. A Hands-on Approach , 2010, Scalable Comput. Pract. Exp..

[9]  Jens H. Krüger,et al.  A Survey of General‐Purpose Computation on Graphics Hardware , 2007, Eurographics.

[10]  B. Silverman,et al.  Algorithm AS 176: Kernel Density Estimation Using the Fast Fourier Transform , 1982 .

[11]  Erricos John Kontoghiorghes,et al.  Parallel Algorithms for Linear Models: Numerical Methods and Estimation Problems , 2000 .

[12]  Martin Lilleeng Sætra,et al.  Graphics processing unit (GPU) programming strategies and trends in GPU computing , 2013, J. Parallel Distributed Comput..

[13]  Konstantinos G. Margaritis,et al.  Parallel Computing of Kernel Density Estimation with Different Multi-core Programming Models , 2013, 2013 21st Euromicro International Conference on Parallel, Distributed, and Network-Based Processing.

[14]  Niall M. Adams,et al.  A review of parallel processing for statistical computation , 1996, Stat. Comput..

[15]  Erricos John Kontoghiorghes,et al.  Handbook of Parallel Computing and Statistics , 2005 .

[16]  Bernard W. Silverman,et al.  Density Estimation for Statistics and Data Analysis , 1987 .

[17]  Wen-mei W. Hwu,et al.  Optimization principles and application performance evaluation of a multithreaded GPU using CUDA , 2008, PPoPP.

[18]  A. V. Dobrovidov,et al.  Bandwidth selection in nonparametric estimator of density derivative by smoothed cross-validation method , 2010 .

[19]  Jeffrey S. Racine,et al.  Parallel distributed kernel estimation , 2002 .

[20]  Michael Creel,et al.  User-Friendly Parallel Computations with Econometric Examples , 2005 .

[21]  Szymon Łukasik,et al.  Parallel Computing of Kernel Density Estimates with MPI , 2007 .

[22]  William L. Goffe,et al.  Multi-core CPUs, Clusters, and Grid Computing: A Tutorial , 2005 .

[23]  C. D. Kemp,et al.  Density Estimation for Statistics and Data Analysis , 1987 .