An Efficient Architectural Design of Hardware Interface for Heterogeneous Multi-core System

How to manage the message passing among inter processor cores with lower overhead is a great challenge when the multi-core system is the contemporary solution to satisfy high performance and low energy demands in general and embedded computing domains. Generally speaking, the networks-on-chip connects the distributed multi-core system. It takes charge of message passing which including data and synchronization message among cores. The size of most data transmission is typically large enough that it remains strongly bandwidth-bound. The synchronization message is very small which is primarily latency bound. Thus the separated networks-on-chip are needed to transmit the above two types of message. In this paper we focus on the network for the transmission of synchronization messages. A hardware module - message passing unit (MPU) is proposed to manage the synchronization message passing for the heterogeneous multi-core system. Compared with the original single network approach, this solution reduces the run-time object scheduling and synchronization overhead effectively, thereby, improving the whole system performance.

[1]  Liu Peng,et al.  Optimizing pipeline for a RISC processor with multimedia extension ISA , 2006 .

[2]  Sethuraman Panchanathan,et al.  Embedded Processors for Multimedia and Communications II , 2005 .

[3]  Thomas J. Ashby,et al.  MPA: Parallelizing an Application onto a Multicore Platform Made Easy , 2009, IEEE Micro.

[4]  I. Matosevic,et al.  The MLCA: A Solution Paradigm for Parallel Programmable SoCs , 2006, 2006 IEEE North-East Workshop on Circuits and Systems.

[5]  Peng Liu,et al.  An object oriented model scheduling for media-SoC , 2009 .

[6]  Lei Gao,et al.  32b RISC/DSP media processor: MediaDSP3201 , 2005, IS&T/SPIE Electronic Imaging.

[7]  Damien Lyonnard,et al.  Parallel programming models for a multiprocessor SoC platform applied to networking and multimedia , 2006, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[8]  J. Tukey,et al.  An algorithm for the machine calculation of complex Fourier series , 1965 .

[9]  Oliver Sinnen,et al.  Task Scheduling for Parallel Systems , 2007, Wiley series on parallel and distributed computing.

[10]  Oliver Sinnen,et al.  Task Scheduling for Parallel Systems (Wiley Series on Parallel and Distributed Computing) , 2007 .

[11]  Robert G. Babb,et al.  Parallel Processing with Large-Grain Data Flow Techniques , 1984, Computer.

[12]  G. Golub,et al.  Eigenvalue computation in the 20th century , 2000 .

[13]  Sander Stuijk,et al.  Dataflow Analysis for Real-Time Embedded Multiprocessor System Design , 2005 .

[14]  刘鹏,et al.  32-bit media digital signal processor , 2004 .