论文信息 - Power/Performance Hardware Optimization for Synchronization Intensive Applications in MPSoCs

Power/Performance Hardware Optimization for Synchronization Intensive Applications in MPSoCs

This paper explores optimization techniques of the synchronization mechanisms for MPSoCs based on complex interconnect (network-on-chip), targeted at future power-efficient systems. The proposed solution is based on the idea of locally performing synchronization operations which require the continuous polling of a shared variable, thus featuring large contention (e.g. spin locks). We introduce a HW module, the synchronization-operation buffer (SB), which queues and manages the requests issued by the processors. Experimental validation has been carried out by using GRAPES, a cycle-accurate performance/power simulation platform. For 8-processor target architecture, we show that the proposed solution achieves up to 40% performance improvement and 30% energy saving with respect to synchronization based on directory-based coherence protocol

[1] Kunle Olukotun,et al. Niagara: a 32-way multithreaded Sparc processor , 2005, IEEE Micro.

[2] G. Nicolescu,et al. Parallel programming models for a multi-processor SoC platform applied to high-speed traffic management , 2004, International Conference on Hardware/Software Codesign and System Synthesis, 2004. CODES + ISSS 2004..

[3] E. Sackinger,et al. A single-chip, 1.6-billion, 16-b MAC/s multiprocessor DSP , 2000, IEEE Journal of Solid-State Circuits.

[4] Wayne H. Wolf,et al. The future of multiprocessor systems-on-chips , 2004, Proceedings. 41st Design Automation Conference, 2004..

[5] Sharad Malik,et al. Flexible and formal modeling of microprocessors with application to retargetable simulation , 2003, 2003 Design, Automation and Test in Europe Conference and Exhibition.

[6] Rohit Bhatia,et al. Montecito: a dual-core, dual-thread Itanium processor , 2005, IEEE Micro.

[7] Michael L. Scott,et al. Algorithms for scalable synchronization on shared-memory multiprocessors , 1991, TOCS.

[8] Sarita V. Adve,et al. Shared Memory Consistency Models: A Tutorial , 1996, Computer.

[9] Anoop Gupta,et al. The SPLASH-2 programs: characterization and methodological considerations , 1995, ISCA.

[10] Santanu Dutta,et al. Viper: A Multiprocessor SOC for Advanced Set-Top Box and Digital TV Systems , 2001, IEEE Des. Test Comput..

[11] Massimo Poncino,et al. Exploring energy/performance tradeoffs in shared memory MPSoCs: snoop-based cache coherence vs. software solutions , 2005, Design, Automation and Test in Europe.

[12] Nathan Ickes,et al. Instruction level and operating system profiling for energy exposed software , 2003, IEEE Trans. Very Large Scale Integr. Syst..

[13] Trevor Mudge. Power: A First Class Design Constraint for Future Architecture and Automation , 2000, HiPC.

[14] Xavier Martorell,et al. Implementing PARMACS Macros for Shared Memory Multiprocessor Environments , 1997 .

[15] David A. Patterson,et al. Computer Architecture: A Quantitative Approach , 1969 .

[16] Thomas E. Anderson,et al. The Performance of Spin Lock Alternatives for Shared-Memory Multiprocessors , 1990, IEEE Trans. Parallel Distributed Syst..

[17] James R. Goodman,et al. Efficient Synchronization: Let Them Eat QOLB , 1997, International Symposium on Computer Architecture.

[18] R. Rajwar,et al. Transactional Execution: Toward Reliable, High-Performance Multithreading , 2003, IEEE Micro.

[19] Alessandro Beda,et al. Heart-Rate Pacing Simulation and Control via Multiagent Systems , 2004 .

[20] Gianluca Palermo,et al. PIRATE: A Framework for Power/Performance Exploration of Network-on-Chip Architectures , 2004, PATMOS.