Task Allocation with Algorithm Transformation for Reducing Data-Transfer Bottlenecks in Heterogeneous Multi-Core Processors: A Case Study of HOG Descriptor Computation

Heterogeneous multi-core processors are attracted by the media processing applications due to their capability of drawing strengths of different cores to improve the overall performance. However, the data transfer bottlenecks and limitations in the task allocation due to the accelerator-incompatible operations prevents us from gaining full potential of the heterogeneous multi-core processors. This paper presents a task allocation method based on algorithm transformation to increase the freedom of task allocation. We use approximation methods such as CORDIC algorithms to map the accelerator-incompatible operations to accelerator cores. According to the experimental results using HOG descriptor computation, the proposed task allocation method reduces the data transfer time by more than 82% and the total processing time by more than 79% compared to the conventional task allocation method.

[1]  Alain Rakotomamonjy,et al.  A Pedestrian Detector Using Histograms of Oriented Gradients and a Support Vector Machine Classifier , 2007, 2007 IEEE Intelligent Transportation Systems Conference.

[2]  Nozomu Togawa,et al.  FIR filter design on Flexible Engine/Generic ALU array and its dedicated synthesis algorithm , 2008, APCCAS 2008 - 2008 IEEE Asia Pacific Conference on Circuits and Systems.

[3]  Masanori Hariyama,et al.  Acceleration of Optical-Flow Extraction Using Dynamically Reconfigurable ALU Arrays , 2009, ERSA.

[4]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[5]  Mei-Chen Yeh,et al.  Fast Human Detection Using a Cascade of Histograms of Oriented Gradients , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[6]  Jun Shirako,et al.  Software-cooperative power-efficient heterogeneous multi-core for media processing , 2008, 2008 Asia and South Pacific Design Automation Conference.

[7]  N. Okumura,et al.  Design and Implementation of a Configurable Heterogeneous Multicore SoC With Nine CPUs and Two Matrix Processors , 2008, IEEE Journal of Solid-State Circuits.

[8]  T. Kamei,et al.  Heterogeneous Multi-Core Architecture That Enables 54x AAC-LC Stereo Encoding , 2008, IEEE Journal of Solid-State Circuits.

[9]  Jack E. Volder The CORDIC Trigonometric Computing Technique , 1959, IRE Trans. Electron. Comput..

[10]  Fernando Gehm Moraes,et al.  Heuristics for Dynamic Task Mapping in NoC-based Heterogeneous MPSoCs , 2007, 18th IEEE/IFIP International Workshop on Rapid System Prototyping (RSP '07).

[11]  Francesco Poletti,et al.  Communication-aware allocation and scheduling framework for stream-oriented multi-processor systems-on-chip , 2006, Proceedings of the Design Automation & Test in Europe Conference.