OSCAR API for Real-Time Low-Power Multicores and Its Performance on Multicores and SMP Servers

OSCAR (Optimally Scheduled Advanced Multiprocessor) API has been designed for real-time embedded low-power multicores to generate parallel programs for various multicores from different vendors by using the OSCAR parallelizing compiler. The OSCAR API has been developed by Waseda University in collaboration with Fujitsu Laboratory, Hitachi, NEC, Panasonic, Renesas Technology, and Toshiba in an METI/NEDO project entitled “Multicore Technology for Realtime Consumer Electronics.” By using the OSCAR API as an interface between the OSCAR compiler and backend compilers, the OSCAR compiler enables hierarchical multigrain parallel processing with memory optimization under capacity restriction for cache memory, local memory, distributed shared memory, and on-chip/off-chip shared memory; data transfer using a DMA controller; and power reduction control using DVFS (Dynamic Voltage and Frequency Scaling), clock gating, and power gating for various embedded multicores. In addition, a parallelized program automatically generated by the OSCAR compiler with OSCAR API can be compiled by the ordinary OpenMP compilers since the OSCAR API is designed on a subset of the OpenMP. This paper describes the OSCAR API and its compatibility with the OSCAR compiler by showing code examples. Performance evaluations of the OSCAR compiler and the OSCAR API are carried out using an IBM Power5+ workstation, an IBM Power6 high-end SMP server, and a newly developed consumer electronics multicore chip RP2 by Renesas, Hitachi and Waseda. From the results of scalability evaluation, it is found that on an average, the OSCAR compiler with the OSCAR API can exploit 5.8 times speedup over the sequential execution on the Power5+ workstation with eight cores and 2.9 times speedup on RP2 with four cores, respectively. In addition, the OSCAR compiler can accelerate an IBM XL Fortran compiler up to 3.3 times on the Power6 SMP server. Due to low-power optimization on RP2, the OSCAR compiler with the OSCAR API achieves a maximum power reduction of 84% in the real-time execution mode.

[1]  S. Suzuki,et al.  A 600MIPS 120mW 70/spl mu/A leakage triple-CPU mobile application processor chip , 2005, ISSCC. 2005 IEEE International Digest of Technical Papers. Solid-State Circuits Conference, 2005..

[2]  Jun Shirako,et al.  Hierarchical Parallelism Control for Multigrain Parallel Processing , 2002, LCPC.

[3]  Jun Shirako,et al.  Multigrain parallel processing on compiler cooperative chip multiprocessor , 2005, 9th Annual Workshop on Interaction between Compilers and Computer Architectures (INTERACT'05).

[4]  Monica S. Lam,et al.  Maximizing Multiprocessor Performance with the SUIF Compiler , 1996, Digit. Tech. J..

[5]  Hironori Kasahara,et al.  Data-localization for Fortran macro-dataflow computation using partial static task assignment , 1996, ICS '96.

[6]  Jun Shirako,et al.  An 8640 MIPS SoC with Independent Power-Off Control of 8 CPUs and 8 RAMs by An Automatic Parallelizing Compiler , 2008, 2008 IEEE International Solid-State Circuits Conference - Digest of Technical Papers.

[7]  Hironori Kasahara,et al.  Coarse Grain Task Parallel Processing with Cache Optimization on Shared Memory Multiprocessor , 2001, LCPC.

[8]  Yunheung Paek,et al.  Parallel Programming with Polaris , 1996, Computer.

[9]  Hironori Kasahara,et al.  A 4320MIPS Four-Processor Core SMP/AMP with Individually Managed Clock Frequency for Low Power Consumption , 2007, 2007 IEEE International Solid-State Circuits Conference. Digest of Technical Papers.

[10]  Hironori Kasahara,et al.  Automatic Coarse Grain Task Parallel Processing on SMP Using OpenMP , 2000, LCPC.

[11]  A. Suga,et al.  A 51.2 GOPS 1.0 GB/s-DMA single-chip multi-processor integrating quadruple 8-way VLIW processors , 2005, ISSCC. 2005 IEEE International Digest of Technical Papers. Solid-State Circuits Conference, 2005..

[12]  Jun Shirako,et al.  Compiler Control Power Saving Scheme for Multi Core Processors , 2005, LCPC.

[13]  S. Asano,et al.  The design and implementation of a first-generation CELL processor , 2005, ISSCC. 2005 IEEE International Digest of Technical Papers. Solid-State Circuits Conference, 2005..

[14]  Naoki Nishi,et al.  Triple-CPU Mobile Application Processor Chip , 2005 .