On The Efficiency of Heterogeneous System Architecture for Image Processing

Graphics Processing Unit (GPU) based image processing algorithms have been previously developed to take advantage of the highly parallel nature of GPUs. However, these algorithms still exhibit problems of high programming complexity, relatively low device utilisation and difficulty when integrating into larger systems. In this paper, a set of image processing modules have been developed to take advantage of the computational characteristics of a System on a Chip (SoC) that contains both a GPU and a Central Processing Unit with fine grained shared virtual memory capabilities. The usage of shared memory simplifies design and removes the latency and bandwidth constraints associated with discrete GPUs on the Peripheral Component Interconnect Express (PCIe) bus. These modules feature a simple, composite design that improves upon previously developed algorithms by running discrete stages of the algorithms on the portions of the SoC that are best suited for them. This allows greater efficiency and lower code complexity than more expensive discrete-GPU-based alternatives.

[1]  Jens H. Krüger,et al.  A Survey of General‐Purpose Computation on Graphics Hardware , 2007, Eurographics.

[2]  John E. Gaffney,et al.  Software Function, Source Lines of Code, and Development Effort Prediction: A Software Science Validation , 1983, IEEE Transactions on Software Engineering.

[3]  Cordelia Schmid,et al.  A Performance Evaluation of Local Descriptors , 2005, IEEE Trans. Pattern Anal. Mach. Intell..

[4]  Wen-mei W. Hwu,et al.  Heterogeneous System Architecture: A New Compute Platform Infrastructure , 2015 .

[5]  S AlexDavid,et al.  Vision-based Vehicle Detection Survey , 2016, Int. J. Recent Contributions Eng. Sci. IT.

[6]  Zeeshan Ahmed,et al.  Image-based Face Detection and Recognition: "State of the Art" , 2013, ArXiv.

[7]  Jarrett Rosenberg,et al.  Some misconceptions about lines of code , 1997, Proceedings Fourth International Software Metrics Symposium.

[8]  Kevin Skadron,et al.  A performance study of general-purpose applications on graphics processors using CUDA , 2008, J. Parallel Distributed Comput..

[9]  Reinhard Klette,et al.  Concise Computer Vision , 2014, Undergraduate Topics in Computer Science.

[10]  Chris F. Kemerer,et al.  Cyclomatic Complexity Density and Software Maintenance Productivity , 1991, IEEE Trans. Software Eng..

[11]  Yoong Choon Chang,et al.  GPU acceleration of real time Viola-Jones face detection , 2015, 2015 IEEE International Conference on Control System, Computing and Engineering (ICCSCE).

[12]  Ichiro Masaki,et al.  Efficient integral image computation on the GPU , 2010, 2010 IEEE Intelligent Vehicles Symposium.

[13]  Cordelia Schmid,et al.  A performance evaluation of local descriptors , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  Renaud Keriven,et al.  GPU-boosted online image matching , 2008, ICPR.

[15]  Wu-chun Feng,et al.  On the Efficacy of a Fused CPU+GPU Processor (or APU) for Parallel Computing , 2011, 2011 Symposium on Application Accelerators in High-Performance Computing.