An efficient protocol with synchronization accelerator for multi-processor embedded systems

Abstract With the proliferation of multi-processor core systems, parallel programming imposes a difficult challenge where current solutions are far from being considered efficient. In order to alleviate the difficulty of parallel programming, we propose a scheduler, which is part of a master–slave RTOS, to efficiently manage the parallel programs running on a multi-processor core system. We also propose an efficient protocol that serves as the interface between the operating system and application programs. This interface protocol runs on a dedicated control subnet to cut down the synchronization overhead between the parallel tasks. Such synchronization overhead incurred in these multi-core parallel systems has been recognized as one of the severe limiting factors when pushing up the performance envelope. Experimental results, obtained from the register-transfer level simulations of various benchmark parallel programs, show that the proposed protocol and the control subnet can improve the system efficiency by up to 33.5%. This protocol, as it is designed to be compatible with the minimum subset of the massage-passing interface functions (MPI), scales well with the number of cores.

[1]  Magnus Själander,et al.  A Look-Ahead Task Management Unit for Embedded Multi-Core Architectures , 2008, 2008 11th EUROMICRO Conference on Digital System Design Architectures, Methods and Tools.

[2]  Thomas J. Ashby,et al.  MPA: Parallelizing an Application onto a Multicore Platform Made Easy , 2009, IEEE Micro.

[3]  G. Golub,et al.  Eigenvalue computation in the 20th century , 2000 .

[4]  Liu Peng,et al.  Optimizing pipeline for a RISC processor with multimedia extension ISA , 2006 .

[5]  Andrei Sergeevich Terechko,et al.  A Hardware Task Scheduler for Embedded Video Processing , 2008, HiPEAC.

[6]  Rabi N. Mahapatra,et al.  A Hardware Scheduler for Real Time Multiprocessor System on Chip , 2010, 2010 23rd International Conference on VLSI Design.

[7]  Michael J. Quinn,et al.  Parallel programming in C with MPI and OpenMP , 2003 .

[8]  David Patterson The trouble with multi-core , 2010, IEEE Spectrum.

[9]  Yang Zhang,et al.  Corey: An Operating System for Many Cores , 2008, OSDI.

[10]  George L.-T. Chiu,et al.  Overview of the Blue Gene/L system architecture , 2005, IBM J. Res. Dev..

[11]  P. A. W. Lewis,et al.  A Pseudo-Random Number Generator for the System/360 , 1969, IBM Syst. J..

[12]  Peng Liua,et al.  Building a Multi-FPGA-based Emulation Framework to Support NoC Design and Verification , 2010 .

[13]  Sander Stuijk,et al.  Dataflow Analysis for Real-Time Embedded Multiprocessor System Design , 2005 .

[14]  Adrian Schüpbach,et al.  The multikernel: a new OS architecture for scalable multicore systems , 2009, SOSP '09.

[15]  Brian T. Lewis,et al.  Thread Scheduling for Multi-Core Platforms , 2007, HotOS.

[16]  Yingtao Jiang,et al.  Building a multi-FPGA-based emulation framework to support networks-on-chip design and verification , 2010 .

[17]  Cheng Li,et al.  A synergetic operating unit on NoC layer for CMP system , 2010, Int. J. High Perform. Syst. Archit..

[18]  F. A. Seiler,et al.  Numerical Recipes in C: The Art of Scientific Computing , 1989 .

[19]  Joël Rivat,et al.  Computing pi(x): the Meissel, Lehmer, Lagarias, Miller, Odlyzko method , 1996, Math. Comput..

[20]  Christopher J. Hughes,et al.  Carbon: architectural support for fine-grained parallelism on chip multiprocessors , 2007, ISCA '07.

[21]  Mei Yang,et al.  An efficient scheduler of RTOS for multi/many-core system , 2012, Comput. Electr. Eng..

[22]  Peng Liu,et al.  An object oriented model scheduling for media-SoC , 2009 .

[23]  Lei Gao,et al.  32b RISC/DSP media processor: MediaDSP3201 , 2005, IS&T/SPIE Electronic Imaging.

[24]  Anoop Gupta,et al.  The SPLASH-2 programs: characterization and methodological considerations , 1995, ISCA.