Data-Reuse-Driven Energy-Aware Cosynthesis of Scratch Pad Memory and Hierarchical Bus-Based Communication Architecture for Multiprocessor Streaming Applications

As technology advances, it becomes feasible to implement a large multiprocessor systems-on-chip (MPSoCs) to satisfy the increased performance demands of embedded applications. The increased complexity of systems leads to an increased power consumption. Reducing the consumption is an important task, considering that the available power may be limited in battery-operated embedded systems. The selection of memory and communication architectures affects the power efficiency of the design. In this paper, we propose a novel approach that enables the energy-aware cosynthesis of both memory and communication architectures for streaming applications. As opposed to earlier techniques, we propose a powerful compile-time analysis of memory access behavior in multiprocessor systems, which adds flexibility in selecting scratch-pad-based memory architectures. We propose and compare three memory/communication synthesis techniques, namely, an optimal mixed integer-linear-programming (ILP)-based cosynthesis technique, a mixed ILP (MILP)-based traditional two-step synthesis approach, where memory and communication synthesis is sequentially performed, and a cosynthesis heuristic that synthesizes energy-efficient hierarchical bus-based communication architectures with guaranteed throughput. Our experimental results on a number of streaming applications show that both the traditional two-step synthesis approach and heuristic result in up to 50% worse power consumption in comparison with our proposed cosynthesis approach. However, on some of the streaming benchmarks, our cosynthesis heuristic approach was able to find optimal or near-optimal results in a much shorter time than the MILP cosynthesis approach.

[1]  Paul Marchal,et al.  Physical design implementation of segmented buses to reduce communication energy , 2006, Asia and South Pacific Conference on Design Automation, 2006..

[2]  Nikil D. Dutt,et al.  Efficient utilization of scratch-pad memory in embedded processor applications , 1997, Proceedings European Design and Test Conference. ED & TC 97.

[3]  Todd M. Austin,et al.  The SimpleScalar tool set, version 2.0 , 1997, CARN.

[4]  Alberto L. Sangiovanni-Vincentelli,et al.  Constraint-driven communication synthesis , 2002, DAC '02.

[5]  Sumesh Udayakumaran,et al.  Compiler-decided dynamic memory allocation for scratch-pad based embedded systems , 2003, CASES '03.

[6]  Peter Marwedel,et al.  Dynamic overlay of scratchpad memory for energy minimization , 2004, International Conference on Hardware/Software Codesign and System Synthesis, 2004. CODES + ISSS 2004..

[7]  Dennis Sylvester,et al.  Impact of small process geometries on microarchitectures in systems on a chip , 2001 .

[8]  M. Miranda,et al.  Memory communication network exploration for low-power distributed memory organisations , 2004, IEEE Workshop onSignal Processing Systems, 2004. SIPS 2004..

[9]  Mahmut T. Kandemir,et al.  Customized on-chip memories for embedded chip multiprocessors , 2005, ASP-DAC.

[10]  Nikil D. Dutt,et al.  FORAY-GEN: automatic generation of affine functions for memory optimizations , 2005, Design, Automation and Test in Europe.

[11]  F. Fallah,et al.  Irredundant address bus encoding for low power , 2001, ISLPED'01: Proceedings of the 2001 International Symposium on Low Power Electronics and Design (IEEE Cat. No.01TH8581).

[12]  Hua Wang,et al.  A global bus power optimization methodology for physical design of memory dominated systems by coupling bus segmentation and activity driven block placement , 2004, ASP-DAC 2004: Asia and South Pacific Design Automation Conference 2004 (IEEE Cat. No.04EX753).

[13]  N. Dutt,et al.  Automated throughput-driven synthesis of bus-based communication architectures , 2005, Proceedings of the ASP-DAC 2005. Asia and South Pacific Design Automation Conference, 2005..

[14]  A. Sangiovanni-Vincentelli,et al.  Constraint-driven communication synthesis , 2002, Proceedings 2002 Design Automation Conference (IEEE Cat. No.02CH37324).

[15]  Kaustav Banerjee,et al.  A power-optimal repeater insertion methodology for global interconnects in nanometer designs , 2002 .

[16]  Peter Marwedel,et al.  Assigning program and data objects to scratchpad for energy reduction , 2002, Proceedings 2002 Design, Automation and Test in Europe Conference and Exhibition.

[17]  Erik Brockmeyer,et al.  Multiprocessor system-on-chip data reuse analysis for exploring customized memory hierarchies , 2006, 2006 43rd ACM/IEEE Design Automation Conference.

[18]  Erik Brockmeyer,et al.  DRDU: A data reuse analysis technique for efficient scratch-pad memory management , 2007, TODE.

[19]  T. F. Chen,et al.  Segmented bus design for low-power systems , 1999, IEEE Trans. Very Large Scale Integr. Syst..

[20]  Kim Sungcham,et al.  Efficient Exploration of On-chip Bus Architectures and Memory Allocation , 2005 .

[21]  Mahmut T. Kandemir,et al.  Exploiting inter-processor data sharing for improving behavior of multi-processor SoCs , 2005, IEEE Computer Society Annual Symposium on VLSI: New Frontiers in VLSI Design (ISVLSI'05).

[22]  Nikil D. Dutt,et al.  Memory system connectivity exploration , 2002, Proceedings 2002 Design, Automation and Test in Europe Conference and Exhibition.

[23]  Daniel D. Gajski,et al.  A novel memory size model for variable-mapping in system level design , 2004 .

[24]  Gerard J. M. Smit,et al.  Energy efficient NoC for best effort communication , 2005, International Conference on Field Programmable Logic and Applications, 2005..

[25]  Nikil D. Dutt,et al.  COSMECA: Application Specific Co-Synthesis of Memory and Communication Architectures for MPSoC , 2006, Proceedings of the Design Automation & Test in Europe Conference.

[26]  Hui Zhang,et al.  Low-swing interconnect interface circuits , 1998, Proceedings. 1998 International Symposium on Low Power Electronics and Design (IEEE Cat. No.98TH8379).

[27]  Vincent John Mooney,et al.  Automated bus generation for multiprocessor SoC design , 2004, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[28]  Nikil D. Dutt,et al.  Data reuse driven energy-aware MPSoC co-synthesis of memory and communication architecture for streaming applications , 2006, Proceedings of the 4th International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS '06).

[29]  Michael Gasteier,et al.  Bus-based communication synthesis on system-level , 1996, Proceedings of 9th International Symposium on Systems Synthesis.

[30]  Amer Baghdadi,et al.  Automatic generation of application-specific architectures for heterogeneous multiprocessor system-on-chip , 2001, Proceedings of the 38th Design Automation Conference (IEEE Cat. No.01CH37232).