Cache coherence tradeoffs in shared-memory MPSoCs

Shared memory is a common interprocessor communication paradigm for single-chip multiprocessor platforms. Snoop-based cache coherence is a very successful technique that provides a clean shared-memory programming abstraction in general-purpose chip multiprocessors, but there is no consensus on its usage in resource-constrained multiprocessor systems on chips (MPSoCs) for embedded applications. This work aims at providing a comparative energy and performance analysis of cache-coherence support schemes in MPSoCs. Thanks to the use of a complete multiprocessor simulation platform, which relies on accurate technology-homogeneous power models, we were able to explore different cache-coherent shared-memory communication schemes for a number of cache configurations and workloads.

[1]  Pat Conway,et al.  The AMD Opteron Processor for Multiprocessor Servers , 2003, IEEE Micro.

[2]  Erik Brunvand,et al.  Impulse: building a smarter memory controller , 1999, Proceedings Fifth International Symposium on High-Performance Computer Architecture.

[3]  Per Stenström,et al.  TLB and snoop energy-reduction using virtual caches in low-power chip-multiprocessors , 2002, ISLPED '02.

[4]  L. Benini,et al.  SDRAM-Energy-Aware Memory Allocation for Dynamic Multi-Media Applications on Multi-Processor Platforms , 2003, Embedded Software for SoC.

[5]  L. Benini,et al.  A Power Modeling and Estimation Framework for VLIW-based Embedded Systems , 2001 .

[6]  Luca Benini,et al.  Analyzing on-chip communication in a MPSoC environment , 2004, Proceedings Design, Automation and Test in Europe Conference and Exhibition.

[7]  Miltos D. Grammatikakis,et al.  Software for Multiprocessor Networks on Chip , 2003, Networks on Chip.

[8]  Rohit Bhatia,et al.  Montecito: a dual-core, dual-thread Itanium processor , 2005, IEEE Micro.

[9]  Babak Falsafi,et al.  JETTY: filtering snoops for reduced energy consumption in SMP servers , 2001, Proceedings HPCA Seventh International Symposium on High-Performance Computer Architecture.

[10]  Mikko H. Lipasti,et al.  Power-Efficient Cache Coherence , 2004 .

[11]  Carlo Guardiani,et al.  Automatic characterization and modeling of power consumption in static RAMs , 1998, Proceedings. 1998 International Symposium on Low Power Electronics and Design (IEEE Cat. No.98TH8379).

[12]  Veljko M. Milutinovic,et al.  Hardware approaches to cache coherence in shared-memory multiprocessors, Part 1 , 1994, IEEE Micro.

[13]  E. Sackinger,et al.  A single-chip, 1.6-billion, 16-b MAC/s multiprocessor DSP , 2000, IEEE Journal of Solid-State Circuits.

[14]  Peter Cumming,et al.  The TI OMAP™ Platform Approach to SOC , 2003 .

[15]  Balaram Sinharoy,et al.  IBM Power5 chip: a dual-core multithreaded processor , 2004, IEEE Micro.

[16]  P. Stenstrom A survey of cache coherence schemes for multiprocessors , 1990, Computer.

[17]  Per Stenström,et al.  A Survey of Cache Coherence Schemes for Multiprocessors , 1990, Computer.

[18]  Veljko M. Milutinovic,et al.  Classifying Software-Based Cache Coherence Solutions , 1997, IEEE Softw..

[19]  Stephen Richardson MPOC: A Chip Multiprocessor for Embedded Systems , 2002 .

[20]  Matthias Gries,et al.  The impact of recent DRAM architectures on embedded systems performance , 2000, Proceedings of the 26th Euromicro Conference. EUROMICRO 2000. Informatics: Inventing the Future.

[21]  Sally A. McKee,et al.  Access order and effective bandwidth for streams on a Direct Rambus memory , 1999, Proceedings Fifth International Symposium on High-Performance Computer Architecture.

[22]  R.N. Stern Tilting at Gates's windmill (Microsoft licensing) , 1994, IEEE Micro.

[23]  Mark D. Hill,et al.  Multiprocessors Should Support Simple Memory-Consistency Models , 1998, Computer.

[24]  Per Stenström,et al.  Evaluation of Snoop-Energy Reduction Techniques for Chip-Multiprocessors , 2002, ISCA 2002.

[25]  David Kaeli,et al.  High Performance Memory Systems , 2003, Springer New York.

[26]  Anoop Gupta,et al.  Parallel computer architecture - a hardware / software approach , 1998 .

[27]  Anoop Gupta,et al.  SPLASH: Stanford parallel applications for shared-memory , 1992, CARN.

[28]  Veljko M. Milutinovic,et al.  Hardware approaches to cache coherence in shared-memory multiprocessors. 2 , 1994, IEEE Micro.

[29]  L. Geppert Sun's big splash [Niagara microprocessor chip] , 2005, IEEE Spectrum.

[30]  BeniniLuca,et al.  Cache coherence tradeoffs in shared-memory MPSoCs , 2006 .

[31]  K. U. Leuven-ESAT,et al.  SDRAM-Energy-Aware Memory Allocation for Dynamic Multi-Media Applications on Multi-Processor Platforms , 2003 .

[32]  Bryan D. Ackland,et al.  A single-chip 1.6 billion 16-b MAC/s multiprocessor DSP , 1999 .