C-HEAP: A Heterogeneous Multi-Processor Architecture Template and Scalable and Flexible Protocol for the Design of Embedded Signal Processing Systems

The key issue in the design of Systems-on-a-Chip (SoC) is to trade-off efficiency against flexibility, and time to market versus cost. Current deep submicron processing technologiesenable integration of multiple software programmable processors (e.g., CPUs,DSPs) and dedicated hardware components into a single cost-efficient IC. Ourtop-down design methodology with various abstraction levels helps designingthese ICs in a reasonable amount of time. This methodology starts with a high-levelexecutable specification, and converges towards a silicon implementation.A major task in the design process is to ensure that all components (hardwareand software) communicate with each other correctly. In this article, we tacklethis problem in the context of the signal processing domain in two ways: wepropose a modular, flexible, and scalable heterogeneous multi-processor architecturetemplate based on distributed shared memory, and we present an efficient andtransparent protocol for communication and (re)configuration. The protocolimplementations have been incorporated in libraries, which allows quick traversalof the various abstraction levels, so enabling incremental design. The designdecisions to be taken at each abstraction level are evaluated by means of(co-)simulation. Prototyping is used too, to verify the system's functionalcorrectness. The effectiveness of our approach is illustrated by a designcase of a multi-standard video and image codec.

[1]  Kees G. W. Goossens,et al.  The Cost of Communication Protocols and Coordination Languages in Embedded Systems , 2002, COORDINATION.

[2]  Erwin A. de Kock,et al.  YAPI: application modeling for signal processing systems , 2000, Proceedings 37th Design Automation Conference.

[3]  J. T. Buck Static scheduling and code generation from dynamic dataflow graphs with integer-valued control streams , 1994, Proceedings of 1994 28th Asilomar Conference on Signals, Systems and Computers.

[4]  Gilles Kahn,et al.  The Semantics of a Simple Language for Parallel Programming , 1974, IFIP Congress.

[5]  René J. van der Vleuten,et al.  DCT-Domain Embedded Memory Compression for Hybrid Video Coders , 2000, J. VLSI Signal Process..

[6]  Albert van der Werf,et al.  Mapping array communication onto FIFO communication - towards an implementation , 2000, ISSS '00.

[7]  Gerard de Haan,et al.  Sub-pixel motion estimation with 3-D recursive search block-matching , 1994, Signal Process. Image Commun..

[8]  David A. Patterson,et al.  Computer architecture (2nd ed.): a quantitative approach , 1996 .

[9]  Alberto L. Sangiovanni-Vincentelli,et al.  Platform-Based Design and Software Design Methodology for Embedded Systems , 2001, IEEE Des. Test Comput..

[10]  Kees G. W. Goossens A protocol and memory manager for on-chip communication , 2001, ISCAS 2001. The 2001 IEEE International Symposium on Circuits and Systems (Cat. No.01CH37196).

[11]  Hugo De Man,et al.  System-Level Power Optimization of Video Codecs on Embedded Cores: A Systematic Approach , 1998, J. VLSI Signal Process..

[12]  Amer Baghdadi,et al.  Automatic generation of application-specific architectures for heterogeneous multiprocessor system-on-chip , 2001, Proceedings of the 38th Design Automation Conference (IEEE Cat. No.01CH37232).

[13]  Antonios Argyriou,et al.  Data-Reuse and Parallel Embedded Architectures for Low-Power, Real-Time Multimedia Applications , 2000, PATMOS.

[14]  Om Prakash Gangwal,et al.  A scalable and flexible data synchronization scheme for embedded HW-SW shared-memory systems , 2001, International Symposium on System Synthesis (IEEE Cat. No.01EX526).

[15]  Hugo De Man,et al.  Constructing application-specific heterogeneous embedded architectures from custom HW/SW applications , 1996, DAC '96.

[16]  H. Sasaki,et al.  Multimedia complex on a chip , 1996, 1996 IEEE International Solid-State Circuits Conference. Digest of TEchnical Papers, ISSCC.

[17]  H. De Man,et al.  Optimization of memory organization and hierarchy for decreased size and power in video and image processing systems , 1995, Records of the 1995 IEEE International Workshop on Memory Technology, Design and Testing.

[18]  Rudy Lauwereins,et al.  Static scheduling of multi-rate and cyclo-static DSP-applications , 1994, Proceedings of 1994 IEEE Workshop on VLSI Signal Processing.

[19]  Jim Lin,et al.  HW-SW co-design and verification of a multi-standard video and image codec , 2001, Proceedings of the IEEE 2001. 2nd International Symposium on Quality Electronic Design.

[20]  Alberto L. Sangiovanni-Vincentelli,et al.  Addressing the system-on-a-chip interconnect woes through communication-based design , 2001, Proceedings of the 38th Design Automation Conference (IEEE Cat. No.01CH37232).

[21]  Rolf Ernst,et al.  Codesign of Embedded Systems: Status and Trends , 1998, IEEE Des. Test Comput..

[22]  Amer Baghdadi,et al.  An efficient architecture model for systematic design of application-specific multiprocessor SoC , 2001, Proceedings Design, Automation and Test in Europe. Conference and Exhibition 2001.

[23]  Rafael Peset Llopis,et al.  RAPIDO: a modular, multi-board, heterogeneous multi-processor, PCI bus based prototyping framework for the validation of SoC VLSI designs , 2002, Proceedings 13th IEEE International Workshop on Rapid System Prototyping.

[24]  Edward A. Lee,et al.  Dataflow process networks , 1995, Proc. IEEE.

[25]  H. De Man,et al.  Global communication and memory optimizing transformations for low power signal processing systems , 1994, Proceedings of 1994 IEEE Workshop on VLSI Signal Processing.

[26]  David A. Patterson,et al.  Computer Architecture: A Quantitative Approach , 1969 .

[27]  P.E.R. Lippens,et al.  A heterogeneous HW-SW architecture for hand-held multimedia terminals , 1998, 1998 IEEE Workshop on Signal Processing Systems. SIPS 98. Design and Implementation (Cat. No.98TH8374).

[28]  Erwin A. de Kock,et al.  COSY communication IP's , 2000, Proceedings 37th Design Automation Conference.

[29]  Jochen A. G. Jess,et al.  Prophid: A Platform-Based Design Method , 2000, Des. Autom. Embed. Syst..

[30]  Antonios Argyriou,et al.  A memory management approach for efficient implementation of multimedia kernels on programmable architectures , 2001, Proceedings IEEE Computer Society Workshop on VLSI 2001. Emerging Technologies for VLSI Systems.

[31]  Konstantinos Konstantinides,et al.  Image and Video Compression Standards: Algorithms and Architectures , 1997 .

[32]  Erwin A. de Kock,et al.  Communication refinement in video systems on chip , 1999, CODES '99.

[33]  Alberto L. Sangiovanni-Vincentelli,et al.  System design: traditional concepts and new paradigms , 1999, Proceedings 1999 IEEE International Conference on Computer Design: VLSI in Computers and Processors (Cat. No.99CB37040).

[34]  Diederik Verkest,et al.  Hardware/software co-design of digital telecommunication systems , 1997, Proc. IEEE.

[35]  Hugo De Man,et al.  CoWare—A design environment for heterogeneous hardware/software systems , 1996, EURO-DAC '96/EURO-VHDL '96.

[36]  Alberto L. Sangiovanni-Vincentelli,et al.  System-level design: orthogonalization of concerns andplatform-based design , 2000, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..

[37]  Michael L. Scott,et al.  Algorithms for scalable synchronization on shared-memory multiprocessors , 1991, TOCS.

[38]  Jeff Magee,et al.  The Koala Component Model for Consumer Electronics Software , 2000, Computer.