Power/Performance Hardware Optimization for Synchronization Intensive Applications in MPSoCs

This paper explores optimization techniques of the synchronization mechanisms for MPSoCs based on complex interconnect (network-on-chip), targeted at future power-efficient systems. The proposed solution is based on the idea of locally performing synchronization operations which require the continuous polling of a shared variable, thus featuring large contention (e.g. spin locks). We introduce a HW module, the synchronization-operation buffer (SB), which queues and manages the requests issued by the processors. Experimental validation has been carried out by using GRAPES, a cycle-accurate performance/power simulation platform. For 8-processor target architecture, we show that the proposed solution achieves up to 40% performance improvement and 30% energy saving with respect to synchronization based on directory-based coherence protocol

[1]  Kunle Olukotun,et al.  Niagara: a 32-way multithreaded Sparc processor , 2005, IEEE Micro.

[2]  G. Nicolescu,et al.  Parallel programming models for a multi-processor SoC platform applied to high-speed traffic management , 2004, International Conference on Hardware/Software Codesign and System Synthesis, 2004. CODES + ISSS 2004..

[3]  E. Sackinger,et al.  A single-chip, 1.6-billion, 16-b MAC/s multiprocessor DSP , 2000, IEEE Journal of Solid-State Circuits.

[4]  Wayne H. Wolf,et al.  The future of multiprocessor systems-on-chips , 2004, Proceedings. 41st Design Automation Conference, 2004..

[5]  Sharad Malik,et al.  Flexible and formal modeling of microprocessors with application to retargetable simulation , 2003, 2003 Design, Automation and Test in Europe Conference and Exhibition.

[6]  Rohit Bhatia,et al.  Montecito: a dual-core, dual-thread Itanium processor , 2005, IEEE Micro.

[7]  Michael L. Scott,et al.  Algorithms for scalable synchronization on shared-memory multiprocessors , 1991, TOCS.

[8]  Sarita V. Adve,et al.  Shared Memory Consistency Models: A Tutorial , 1996, Computer.

[9]  Anoop Gupta,et al.  The SPLASH-2 programs: characterization and methodological considerations , 1995, ISCA.

[10]  Santanu Dutta,et al.  Viper: A Multiprocessor SOC for Advanced Set-Top Box and Digital TV Systems , 2001, IEEE Des. Test Comput..

[11]  Massimo Poncino,et al.  Exploring energy/performance tradeoffs in shared memory MPSoCs: snoop-based cache coherence vs. software solutions , 2005, Design, Automation and Test in Europe.

[12]  Nathan Ickes,et al.  Instruction level and operating system profiling for energy exposed software , 2003, IEEE Trans. Very Large Scale Integr. Syst..

[13]  Trevor Mudge Power: A First Class Design Constraint for Future Architecture and Automation , 2000, HiPC.

[14]  Xavier Martorell,et al.  Implementing PARMACS Macros for Shared Memory Multiprocessor Environments , 1997 .

[15]  David A. Patterson,et al.  Computer Architecture: A Quantitative Approach , 1969 .

[16]  Thomas E. Anderson,et al.  The Performance of Spin Lock Alternatives for Shared-Memory Multiprocessors , 1990, IEEE Trans. Parallel Distributed Syst..

[17]  James R. Goodman,et al.  Efficient Synchronization: Let Them Eat QOLB , 1997, International Symposium on Computer Architecture.

[18]  R. Rajwar,et al.  Transactional Execution: Toward Reliable, High-Performance Multithreading , 2003, IEEE Micro.

[19]  Alessandro Beda,et al.  Heart-Rate Pacing Simulation and Control via Multiagent Systems , 2004 .

[20]  Gianluca Palermo,et al.  PIRATE: A Framework for Power/Performance Exploration of Network-on-Chip Architectures , 2004, PATMOS.