Parallelizing Compiler Framework and API for Power Reduction and Software Productivity of Real-Time Heterogeneous Multicores

Heterogeneous multicores have been attracting much attention to attain high performance keeping power consumption low in wide spread of areas. However, heterogeneous multicores force programmers very difficult programming. The long application program development period lowers product competitiveness. In order to overcome such a situation, this paper proposes a compilation framework which bridges a gap between programmers and heterogeneous multicores. In particular, this paper describes the compilation framework based on OSCAR compiler. It realizes coarse grain task parallel processing, data transfer using a DMA controller, power reduction control from user programs with DVFS and clock gating on various heterogeneous multicores from different vendors. This paper also evaluates processing performance and the power reduction by the proposed framework on a newly developed 15 core heterogeneous multicore chip named RP-X integrating 8 general purpose processor cores and 3 types of accelerator cores which was developed by Renesas Electronics, Hitachi, Tokyo Institute of Technology and Waseda University. The framework attains speedups up to 32x for an optical flow program with eight general purpose processor cores and four DRP(Dynamically Reconfigurable Processor) accelerator cores against sequential execution by a single processor core and 80% of power reduction for the real-time AAC encoding.

[1]  Toshiaki Takahashi,et al.  NaviEngine 1, System LSI for SMP-Based Car Navigation Systems , 2007 .

[2]  Jun Shirako,et al.  Compiler Control Power Saving Scheme for Multi Core Processors , 2005, LCPC.

[3]  S. Suzuki,et al.  A 600MIPS 120mW 70/spl mu/A leakage triple-CPU mobile application processor chip , 2005, ISSCC. 2005 IEEE International Digest of Technical Papers. Solid-State Circuits Conference, 2005..

[4]  Hyesoon Kim,et al.  Qilin: Exploiting parallelism on heterogeneous multiprocessors with adaptive mapping , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[5]  R. Dolbeau,et al.  HMPP TM : A Hybrid Multi-core Parallel Programming Environment , 2022 .

[6]  Rosa M. Badia,et al.  CellSs: a Programming Model for the Cell BE Architecture , 2006, ACM/IEEE SC 2006 Conference (SC'06).

[7]  Hironori Kasahara,et al.  A 4320MIPS Four-Processor Core SMP/AMP with Individually Managed Clock Frequency for Low Power Consumption , 2007, 2007 IEEE International Solid-State Circuits Conference. Digest of Technical Papers.

[8]  S. Asano,et al.  The design and implementation of a first-generation CELL processor , 2005, ISSCC. 2005 IEEE International Digest of Technical Papers. Solid-State Circuits Conference, 2005..

[9]  Hiroki Honda,et al.  A Multi-Grain Parallelizing Compilation Scheme for OSCAR (Optimally Scheduled Advanced Multiprocessor) , 1991, LCPC.

[10]  Jun Shirako,et al.  OSCAR API for Real-Time Low-Power Multicores and Its Performance on Multicores and SMP Servers , 2009, LCPC.

[11]  Naoki Nishi,et al.  Triple-CPU Mobile Application Processor Chip , 2005 .

[12]  Yao Zhang,et al.  Parallel Computing Experiences with CUDA , 2008, IEEE Micro.

[13]  Masaya Sumita,et al.  Low Power Techniques for Mobile Application SoCs Based on Integrated Platform "UniPhier" , 2007, 2007 Asia and South Pacific Design Automation Conference.

[14]  Barbara Horner-Miller,et al.  Proceedings of the 2006 ACM/IEEE conference on Supercomputing , 2006 .

[15]  Michael Wolfe,et al.  Implementing the PGI Accelerator model , 2010, GPGPU-3.

[16]  Junichi Miyakoshi,et al.  A 45nm 37.3GOPS/W heterogeneous multi-core SoC , 2010, 2010 IEEE International Solid-State Circuits Conference - (ISSCC).

[17]  Jens H. Krüger,et al.  GPGPU: general purpose computation on graphics hardware , 2004, SIGGRAPH '04.

[18]  H. Kasahara,et al.  Parallelizable C and Its Performance on Low Power High Performance Multicore Processors , 2010 .

[19]  Hironori Kasahara,et al.  Automatic Coarse Grain Task Parallel Processing on SMP Using OpenMP , 2000, LCPC.