Impact of data layouts on the efficiency of GPU-accelerated IDW interpolation

This paper focuses on evaluating the impact of different data layouts on the computational efficiency of GPU-accelerated Inverse Distance Weighting (IDW) interpolation algorithm. First we redesign and improve our previous GPU implementation that was performed by exploiting the feature of CUDA dynamic parallelism (CDP). Then we implement three versions of GPU implementations, i.e., the naive version, the tiled version, and the improved CDP version, based upon five data layouts, including the Structure of Arrays (SoA), the Array of Structures (AoS), the Array of aligned Structures (AoaS), the Structure of Arrays of aligned Structures (SoAoS), and the Hybrid layout. We also carry out several groups of experimental tests to evaluate the impact. Experimental results show that: the layouts AoS and AoaS achieve better performance than the layout SoA for both the naive version and tiled version, while the layout SoA is the best choice for the improved CDP version. We also observe that: for the two combined data layouts (the SoAoS and the Hybrid), there are no notable performance gains when compared to other three basic layouts. We recommend that: in practical applications, the layout AoaS is the best choice since the tiled version is the fastest one among three versions. The source code of all implementations are publicly available.

[1]  Huayi Wu,et al.  Leveraging the power of multi-core platforms for large-scale geospatial data processing: Exemplified by generating DEM from massive LiDAR point clouds , 2010, Comput. Geosci..

[2]  Li Kuang,et al.  Accelerating geospatial analysis on GPUs using CUDA , 2011, Journal of Zhejiang University SCIENCE C.

[3]  Marc P. Armstrong,et al.  Massively parallel strategies for local spatial interpolation , 1997 .

[4]  James Abel,et al.  Applications Tuning for Streaming SIMD Extensions , 1999 .

[5]  Robert M. Farber,et al.  CUDA Application Design and Development , 2011 .

[6]  Ladislav Huraj,et al.  Comparison of design and performance of snow cover computing on GPUs and multi-core processors , 2010 .

[7]  Daniel N. Wilke,et al.  Development of a convex polyhedral discrete element simulation framework for NVIDIA Kepler based GPUs , 2014, J. Comput. Appl. Math..

[8]  Xiaoming Li,et al.  CUDA Memory Optimizations for Large Data-Structures in the Gravit Simulator , 2011 .

[9]  Jean-Laurent Mallet,et al.  Discrete smooth interpolation in geometric modelling , 1992, Comput. Aided Des..

[10]  Wen-mei W. Hwu,et al.  DL: A data layout transformation system for heterogeneous computing , 2012, 2012 Innovative Parallel Computing (InPar).

[11]  Florian Hanzer,et al.  Spatial interpolation of scattered geoscientific data , 2012 .

[12]  Edzer Pebesma,et al.  Spatial interpolation in massively parallel computing environments , 2011 .

[13]  Jian Wang,et al.  Explorations of the implementation of a parallel IDW interpolation algorithm in a Linux cluster-based parallel GIS , 2011, Comput. Geosci..

[14]  D. Krige A statistical approach to some basic mine valuation problems on the Witwatersrand, by D.G. Krige, published in the Journal, December 1951 : introduction by the author , 1951 .

[15]  D. Shepard A two-dimensional interpolation function for irregularly-spaced data , 1968, ACM National Conference.

[16]  Carlo Bertolli,et al.  Designing OP2 for GPU architectures , 2013, J. Parallel Distributed Comput..

[17]  Gang Mei Evaluating the Power of GPU Acceleration for IDW Interpolation Algorithm , 2014, TheScientificWorldJournal.

[18]  Nathan Bell,et al.  Thrust: A Productivity-Oriented Library for CUDA , 2012 .

[19]  Anne E. Trefethen,et al.  Design and initial performance of a high-level unstructured mesh framework on heterogeneous parallel systems , 2013, Parallel Comput..

[20]  David R. Kaeli,et al.  Data Structures and Transformations for Physically Based Simulation on a GPU , 2010, VECPAR.

[21]  Xiaoming Li,et al.  CUDA Memory Optimizations for Large Data-Structures in the Gravit Simulator , 2009, 2009 International Conference on Parallel Processing Workshops.

[22]  Robert Strzodka Data layout optimization for multi-valued containers in OpenCL , 2012, J. Parallel Distributed Comput..

[23]  Robert Strzodka Abstraction for AoS and SoA layout in C , 2011 .