Multi-GPUs Gaussian filtering for real-time big data processing

Gaussian filtering has been extensively used in the field of surface metrology. However, the computing performance becomes a core bottleneck for Gaussian filtering algorithm based applications when facing large-scale and/or real-time data processing. Although researchers tried to accelerate Gaussian filtering algorithm by using GPU (Graphics Processing Unit), single GPU still fail to meet the large-scale and real-time requirements of surface texture micro- and nano-measurements. Therefore, to solve this bottleneck problem, this paper proposes a single node multi-GPUs based computing framework to accelerate the 2D Gaussian filtering algorithm. This paper presents that the devised framework seamlessly integrated the multi-level spatial domain decomposition method and the CUDA stream mechanism to overlap the two main time consuming steps, i.e., the data transfer and GPU kernel execution, such that it can increase concurrency and reduce the overall running time. This paper also tests and evaluates the proposed computing framework with other three conventional solutions by using large-scale measured data extracted from real mechanical surfaces, and the final results show that the proposed framework achieved higher efficiency. It also proved that this framework satisfies the real-time and big data requirements in micro- and nano-surface texture measurement.

[1]  Jie Cheng,et al.  Programming Massively Parallel Processors. A Hands-on Approach , 2010, Scalable Comput. Pract. Exp..

[2]  Cristian Grozea,et al.  FPGA vs. Multi-core CPUs vs. GPUs: Hands-On Experience with a Sorting Application , 2010, Facing the Multicore-Challenge.

[3]  Margaret Martonosi,et al.  Reducing GPU offload latency via fine-grained CPU-GPU synchronization , 2013, 2013 IEEE 19th International Symposium on High Performance Computer Architecture (HPCA).

[4]  Martin Uecker,et al.  A Multi-GPU Programming Library for Real-Time Applications , 2012, ICA3PP.

[5]  Yang Su,et al.  Stream-Based Data Filtering for Accelerating Metrological Data Characterization , 2008 .

[6]  Satoshi Matsuoka,et al.  High performance 3-D FFT using multiple CUDA GPUs , 2012, GPGPU-5.

[7]  Michael Krystek,et al.  Discrete L-spline filtering in roundness measurements , 1996 .

[8]  Liam Blunt,et al.  Paradigm shifts in surface metrology. Part I. Historical philosophy , 2007, Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences.

[9]  John D. Owens,et al.  Multi-GPU MapReduce on GPU Clusters , 2011, 2011 IEEE International Parallel & Distributed Processing Symposium.

[10]  David J. Whitehouse,et al.  Technological shifts in surface metrology , 2012 .

[11]  Yang Su,et al.  GPGPU-based Gaussian Filtering for Surface Metrological Data Processing , 2008, 2008 12th International Conference Information Visualisation.

[12]  Liam Blunt,et al.  Paradigm shifts in surface metrology. Part II. The current shift , 2007, Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences.

[13]  Bin Hui,et al.  A real-time multi-scale 2D Gaussian filter based on FPGA , 2014, Other Conferences.

[14]  Yifeng Chen,et al.  Large-scale FFT on GPU clusters , 2010, ICS '10.

[15]  Tae-Young Choe,et al.  Two-way partitioning of a recursive Gaussian filter in CUDA , 2014, EURASIP J. Image Video Process..

[16]  Liam Blunt,et al.  Advanced Techniques for Assessment Surface Topography: Development of a Basis for 3D Surface Texture Standards "Surfstand" , 2003 .