论文信息 - Combining high productivity and high performance in image processing using Single Assignment C on multi-core CPUs and many-core GPUs

Combining high productivity and high performance in image processing using Single Assignment C on multi-core CPUs and many-core GPUs

We address the challenge of parallelization development of industrial high-performance inspection systems comparing a conventional parallelization approach versus an auto-parallelized technique. Therefore, we introduce the functional array processing language Single Assignment C (SAC), which relies on a hardware virtualization concept for automated, parallel machine code generation for multi-core CPUs and GPUs. Additional software engineering aspects like programmability, productivity, understandability, maintainability, and resulting achieved gain in performance are discussed from the point of view of a developer. With several illustrative benchmarking examples from the field of image processing and machine learning, the relationship between runtime performance and efficiency of development is analyzed.

[1] Jie Cheng,et al. Programming Massively Parallel Processors. A Hands-on Approach , 2010, Scalable Comput. Pract. Exp..

[2] Nello Cristianini,et al. An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[3] Barbara Chapman,et al. Using OpenMP - portable shared memory parallel programming , 2007, Scientific and engineering computation.

[4] Clemens Grelck,et al. SAC—A Functional Array Language for Efficient Multi-threaded Execution , 2006, International Journal of Parallel Programming.

[5] Sven-Bodo Scholz,et al. Breaking the GPU programming barrier with the auto-parallelising SAC compiler , 2011, DAMP '11.

[6] Rudolf Eigenmann,et al. OpenMPC: Extended OpenMP Programming and Tuning for GPUs , 2010, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis.

[7] Sanjay Ghemawat,et al. MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[8] Bernhard Schölkopf,et al. Support Vector Method for Novelty Detection , 1999, NIPS.

[9] Ken Kennedy,et al. Optimizing Compilers for Modern Architectures: A Dependence-based Approach , 2001 .

[10] Clemens Grelck,et al. Sac - From High-Level Programming with Arrays to Efficient Parallel Execution , 2003, Parallel Process. Lett..

[11] K. Hawick,et al. Stencil Methods on Distributed High Performance Computers , 2022 .

[12] Anthony Widjaja,et al. Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond , 2003, IEEE Transactions on Neural Networks.

[13] Jitendra Malik,et al. Scale-Space and Edge Detection Using Anisotropic Diffusion , 1990, IEEE Trans. Pattern Anal. Mach. Intell..

[14] Clemens Grelck,et al. Shared memory multiprocessor support for functional array processing in SAC , 2005, J. Funct. Program..

[15] Sven-Bodo Scholz,et al. Towards Compiling SAC to CUDA , 2009, Trends in Functional Programming.

[16] Clemens Grelck,et al. Axis Control in SAC , 2002, IFL.

[17] Clemens Grelck,et al. Compiling the functional data-parallel language SaC for Microgrids of Self-Adaptive Virtual Processors , 2009 .

[18] Clemens Grelck,et al. Merging Compositions of Array Skeletons in SAC , 2005, PARCO.

[19] Bernhard Schölkopf,et al. Estimating the Support of a High-Dimensional Distribution , 2001, Neural Computation.

[20] Kurt Keutzer,et al. Fast support vector machine training and classification on graphics processors , 2008, ICML '08.

[21] Alejandro Duran,et al. A Proposal to Extend the OpenMP Tasking Model for Heterogeneous Architectures , 2009, IWOMP.

[22] Tarek S. Abdelrahman,et al. hiCUDA: a high-level directive-based language for GPU programming , 2009, GPGPU-2.

[23] Bernhard Schölkopf,et al. Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond , 2005, IEEE Transactions on Neural Networks.

[24] James Reinders,et al. Intel® threading building blocks , 2008 .

[25] Vladimir Vapnik,et al. Statistical learning theory , 1998 .