Analysis of performance enhancement on graphic processor based heterogeneous architecture: A CUDA and MATLAB experiment

Today multiprocessors, multicores, clusters and heterogeneous computing are becoming the most popular architectures to achieve high performance computing. The different approaches are made by system designers to enhance the system performance such as increasing clock frequency of CPUs from MHz to GHz and addition of more number of CPU cores i.e from single core processor to dual core, quad core, hexa core, octo core, ten core and more processors. Still, multicore processing creates some challenges of its own. The extra core results into increased processor size and also high power consumption. Meanwhile, General Purpose Graphics Processing Units (GPGPUs) are designed and implemented that contain hundreds of cores with more number of Arithmetic and Logic Units and Control Units. These GPGPUs can be used in addition to CPU for heterogeneous computing for the enhancement of system performance for selected applications by data parallelism. The heterogeneous programming environment that includes other processors like GPGPU in addition to CPU can be used to enhance the execution performance of computational intensive programs. So, it is necessary for the programmer to run and analyze the selected computational intensive programs on both homogeneous and heterogeneous programming platform. The homogeneous programming environment makes the use of multi core CPU, where as the heterogeneous programming environment makes the use of different processors such as General Purpose Graphics Processing Unit (GPGPUs), Field Programmable Gate Arrays (FPGAs), Digital Signal Processors (DSPs) in addition to CPU. Hence, the programmer needs to write the code that makes the use of both CPU and other processors by using heterogeneous software environment such as parallel MATLAB with GPU enabled functions, MATLAB supported CUDA kernels and CUDA C for the execution of parallel code to achieve high performance in heterogeneous programming environment in comparison with homogeneous (sequential) programming approach with only CPU.

[1]  Amnon Barak,et al.  A package for OpenCL based heterogeneous computing on clusters with many GPU devices , 2010, 2010 IEEE International Conference On Cluster Computing Workshops and Posters (CLUSTER WORKSHOPS).

[2]  José García Rodríguez,et al.  Parallel Computational Intelligence-Based Multi-Camera Surveillance System , 2014, J. Sens. Actuator Networks.

[3]  Nan Wu,et al.  Efficient Parallel Video Processing Techniques on GPU: From Framework to Implementation , 2014, TheScientificWorldJournal.

[4]  Steve Mann,et al.  Using graphics devices in reverse: GPU-based Image Processing and Computer Vision , 2008, 2008 IEEE International Conference on Multimedia and Expo.

[5]  Jan Prikryl Graphics Card as a Cheap Supercomputer , 2013 .

[6]  Gaurav Sharma,et al.  MATLAB®: A Language for Parallel Computing , 2009, International Journal of Parallel Programming.

[7]  Yi-Pin Hsu,et al.  Parallel-computing Approach for FFT Implementation on Digital Signal Processor (DSP) , 2008 .

[8]  P. Szymczyk,et al.  Matlab and Parallel Computing , 2012 .

[9]  Jason Sanders,et al.  CUDA by example: an introduction to general purpose GPU programming , 2010 .

[10]  Jie Cheng,et al.  CUDA by Example: An Introduction to General-Purpose GPU Programming , 2010, Scalable Comput. Pract. Exp..

[11]  José M. García,et al.  The GPU on the Matrix-Matrix Multiply: Performance Study and Contributions , 2009, PARCO.

[12]  Ke Wang,et al.  Automatic FFT Performance Tuning on OpenCL GPUs , 2011, 2011 IEEE 17th International Conference on Parallel and Distributed Systems.

[13]  Huiyang Zhou,et al.  Accelerating MATLAB Image Processing Toolbox functions on GPUs , 2010, GPGPU-3.

[14]  MATRIX-MATRIX MULTIPLICATION IN MATLAB USING THE GPU , 2012 .

[15]  David R. Kaeli,et al.  Analyzing Optimization Techniques for Power Efficiency on Heterogeneous Platforms , 2013, 2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum.

[16]  Collin McCurdy,et al.  The Scalable Heterogeneous Computing (SHOC) benchmark suite , 2010, GPGPU-3.

[17]  Jorge de la Calleja,et al.  Point to point processing of digital images using parallel computing , 2012 .

[19]  Thomas Bräunl,et al.  Tutorial in Data Parallel Image Processing , 2001 .