Centralized buffer router: A low latency, low power router for high radix NOCs

While router buffers have been used as performance multipliers, they are also major consumers of area and power in on-chip networks. In this paper, we propose centralized elastic bubble router - a router micro-architecture based on the use of centralized buffers (CB) with elastic buffered (EB) links. At low loads, the CB is power gated, bypassed, and optimized to produce single cycle operation. A novel extension to bubble flow control enables routing deadlock and message dependent deadlock to be avoided with the same mechanism having constant buffer size per router independent of the number of message types. This solution enables end-to-end latency reduction via high radix switches with low overall buffer requirements. Comparisons made with other low latency routers across different topologies show consistent performance improvement, for example 26% improvement in no load latency of a 2D Mesh and 4X improvement in saturation throughput in a 2D-Generalized Hypercube.

[1]  Sharad Malik,et al.  Orion: a power-performance simulator for interconnection networks , 2002, 35th Annual IEEE/ACM International Symposium on Microarchitecture, 2002. (MICRO-35). Proceedings..

[2]  Sudhakar Yalamanchili,et al.  Centralized Buffer Router with Elastic Links and Bubble Flow Control , 2013 .

[3]  Onur Mutlu,et al.  A case for bufferless routing in on-chip networks , 2009, ISCA '09.

[4]  Chris Fallin,et al.  CHIPPER: A low-complexity bufferless deflection router , 2011, 2011 IEEE 17th International Symposium on High Performance Computer Architecture.

[5]  William J. Dally,et al.  Principles and Practices of Interconnection Networks , 2004 .

[6]  Lionel M. Ni,et al.  The Turn Model for Adaptive Routing , 1992, [1992] Proceedings the 19th Annual International Symposium on Computer Architecture.

[7]  Pedro López,et al.  Reducing Packet Dropping in a Bufferless NoC , 2008, Euro-Par.

[8]  Hideharu Amano,et al.  Prediction router: Yet another low latency on-chip router architecture , 2009, 2009 IEEE 15th International Symposium on High Performance Computer Architecture.

[9]  George Michelogiannakis,et al.  Evaluating Elastic Buffer and Wormhole Flow Control , 2011, IEEE Transactions on Computers.

[10]  Lizhong Chen,et al.  Worm-Bubble Flow Control , 2013, 2013 IEEE 19th International Symposium on High Performance Computer Architecture (HPCA).

[11]  Dara Rahmati,et al.  Power-efficient deterministic and adaptive routing in torus networks-on-chip , 2012, Microprocess. Microsystems.

[12]  Yuval Tamir,et al.  Dynamically-Allocated Multi-Queue Buffers for VLSI Communication Switches , 1992, IEEE Trans. Computers.

[13]  Tilak Agerwala,et al.  SP2 System Architecture , 1999, IBM Syst. J..

[14]  Timothy Mark Pinkston,et al.  A Progressive Approach to Handling Message-Dependent Deadlock in Parallel Computer Systems , 2003, IEEE Trans. Parallel Distributed Syst..

[15]  Hiroshi Nakamura,et al.  Ultra Fine-Grained Run-Time Power Gating of On-chip Routers for CMPs , 2010, 2010 Fourth ACM/IEEE International Symposium on Networks-on-Chip.

[16]  Natalie D. Enright Jerger,et al.  SCARAB: A single cycle adaptive routing and bufferless network , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[17]  Bevan M. Baas,et al.  RoShaQ: High-performance on-chip router with shared queues , 2011, 2011 IEEE 29th International Conference on Computer Design (ICCD).

[18]  Bill Lin,et al.  Design of a High-Throughput Distributed Shared-Buffer NoC Router , 2010, 2010 Fourth ACM/IEEE International Symposium on Networks-on-Chip.

[19]  William J. Dally,et al.  A delay model and speculative architecture for pipelined routers , 2001, Proceedings HPCA Seventh International Symposium on High-Performance Computer Architecture.

[20]  Bruce Jacob,et al.  DRAMSim2: A Cycle Accurate Memory System Simulator , 2011, IEEE Computer Architecture Letters.

[21]  Sudhakar Yalamanchili,et al.  Interconnection Networks , 2011, Encyclopedia of Parallel Computing.

[22]  Lizhong Chen,et al.  Critical Bubble Scheme: An Efficient Implementation of Globally Aware Network Flow Control , 2011, 2011 IEEE International Parallel & Distributed Processing Symposium.

[23]  George Michelogiannakis,et al.  Elastic-buffer flow control for on-chip networks , 2009, 2009 IEEE 15th International Symposium on High Performance Computer Architecture.

[24]  José Duato,et al.  Adaptive bubble router: a design to improve performance in torus networks , 1999, Proceedings of the 1999 International Conference on Parallel Processing.

[25]  Onur Mutlu,et al.  Kilo-NOC: A heterogeneous network-on-chip architecture for scalability and service guarantees , 2011, 2011 38th Annual International Symposium on Computer Architecture (ISCA).