A multi-Vdd dynamic variable-pipeline on-chip router for CMPs

We propose a multi-voltage (multi-Vdd) variable pipeline router to reduce the power consumption of Network-on-Chips (NoCs) designed for chip multi-processors (CMPs). Our multi-Vdd variable pipeline router adjusts its pipeline depth (i.e., communication latency) and supply voltage level in response to the applied workload. Unlike dynamic voltage and frequency scaling (DVFS) routers, the operating frequency is the same for all routers throughout the CMP; thus, there is no need to synchronize neighboring routers working at different frequencies. In this paper, we implemented the multi-Vdd variable pipeline router, which selects two supply voltage levels and pipeline modes, using a 65nm CMOS process and evaluated it using a full-system CMP simulator. Evaluation results show that although the application performance degraded by 1.0% to 2.1%, the standby power of NoCs reduced by 10.4% to 44.4%.

[1]  Timothy Mark Pinkston,et al.  Characterizing the Cell EIB On-Chip Network , 2007, IEEE Micro.

[2]  Milo M. K. Martin,et al.  Multifacet's general execution-driven multiprocessor simulator (GEMS) toolset , 2005, CARN.

[3]  Chita R. Das,et al.  A case for dynamic frequency tuning in on-chip networks , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[4]  William J. Dally,et al.  A delay model and speculative architecture for pipelined routers , 2001, Proceedings HPCA Seventh International Symposium on High-Performance Computer Architecture.

[5]  Niraj K. Jha,et al.  Garnet : A Detailed Interconnect Model Inside a Full-System Simulation Framework , .

[6]  Fredrik Larsson,et al.  Simics: A Full System Simulation Platform , 2002, Computer.

[7]  David A. Wood,et al.  Managing Wire Delay in Large Chip-Multiprocessor Caches , 2004, 37th International Symposium on Microarchitecture (MICRO-37'04).

[8]  Hajime Shimada,et al.  Pipeline stage unification: a low-energy consumption technique for future mobile processors , 2003, ISLPED '03.

[9]  H. Jin,et al.  - 3-The OpenMP Implementation of NAS Parallel Benchmarks and Its Performance , 1999 .

[10]  Jinuk Luke Shin,et al.  A Power-Efficient High-Throughput 32-Thread SPARC Processor , 2006, 2006 IEEE International Solid State Circuits Conference - Digest of Technical Papers.

[11]  Hiroshi Nakamura,et al.  Performance, Area, and Power Evaluations of Ultrafine-Grained Run-Time Power-Gating Routers for CMPs , 2011, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[12]  Hideharu Amano,et al.  A variable-pipeline on-chip router optimized to traffic pattern , 2010, NoCArc '10.

[13]  Timothy Mattson,et al.  A 48-Core IA-32 message-passing processor with DVFS in 45nm CMOS , 2010, 2010 IEEE International Solid-State Circuits Conference - (ISSCC).

[14]  H. Lhermet,et al.  An Asynchronous Power Aware and Adaptive NoC Based Circuit , 2009, IEEE Journal of Solid-State Circuits.

[15]  Henry Hoffmann,et al.  On-Chip Interconnection Architecture of the Tile Processor , 2007, IEEE Micro.