MinBD: Minimally-Buffered Deflection Routing for Energy-Efficient Interconnect

A conventional Network-on-Chip (NoC) router uses input buffers to store in-flight packets. These buffers improve performance, but consume significant power. It is possible to bypass these buffers when they are empty, reducing dynamic power, but static buffer power, and dynamic power when buffers are utilized, remain. To improve energy efficiency, buffer less deflection routing removes input buffers, and instead uses deflection (misrouting) to resolve contention. However, at high network load, deflections cause unnecessary network hops, wasting power and reducing performance. In this work, we propose a new NoC router design called the minimally-buffered deflection (MinBD) router. This router combines deflection routing with a small "side buffer," which is much smaller than conventional input buffers. A MinBD router places some network traffic that would have otherwise been deflected in this side buffer, reducing deflections significantly. The router buffers only a fraction of traffic, thus making more efficient use of buffer space than a router that holds every flit in its input buffers. We evaluate MinBD against input-buffered routers of various sizes that implement buffer bypassing, a buffer less router, and a hybrid design, and show that MinBD is more energy efficient than all prior designs, and has performance that approaches the conventional input-buffered router with area and power close to the buffer less router.

[1]  John Kim,et al.  Low-cost router microarchitecture for on-chip networks , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[2]  Anoop Gupta,et al.  Parallel computer architecture - a hardware / software approach , 1998 .

[3]  David Kroft,et al.  Lockup-free instruction fetch/prefetch cache organization , 1998, ISCA '81.

[4]  Hyunjin Lee,et al.  CloudCache: Expanding and shrinking private caches , 2011, 2011 IEEE 17th International Symposium on High Performance Computer Architecture.

[5]  Onur Mutlu,et al.  A case for bufferless routing in on-chip networks , 2009, ISCA '09.

[6]  W. Daniel Hillis,et al.  The connection machine , 1985 .

[7]  Natalie D. Enright Jerger,et al.  SCARAB: A single cycle adaptive routing and bufferless network , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[8]  Stijn Eyerman,et al.  System-Level Performance Metrics for Multiprogram Workloads , 2008, IEEE Micro.

[9]  Edward T. Grochowski,et al.  Larrabee: A many-Core x86 architecture for visual computing , 2008, 2008 IEEE Hot Chips 20 Symposium (HCS).

[10]  Sriram R. Vangal,et al.  A 5-GHz Mesh Interconnect for a Teraflops Processor , 2007, IEEE Micro.

[11]  Rajiv Kapoor,et al.  Pinpointing Representative Portions of Large Intel® Itanium® Programs with Dynamic Instrumentation , 2004, 37th International Symposium on Microarchitecture (MICRO-37'04).

[12]  Henry Hoffmann,et al.  The Raw Microprocessor: A Computational Fabric for Software Circuits and General-Purpose Programs , 2002, IEEE Micro.

[13]  Valentin Puente,et al.  Rotary router: an efficient architecture for CMP interconnection networks , 2007, ISCA '07.

[14]  Sharad Malik,et al.  Orion: a power-performance simulator for interconnection networks , 2002, MICRO.

[15]  Chris Fallin,et al.  Next generation on-chip networks: what kind of congestion control do we need? , 2010, Hotnets-IX.

[16]  William J. Dally,et al.  Deadlock-Free Message Routing in Multiprocessor Interconnection Networks , 1987, IEEE Transactions on Computers.

[17]  Sharad Malik,et al.  Power-driven Design of Router Microarchitectures in On-chip Networks , 2003, MICRO.

[18]  Onur Mutlu,et al.  Kilo-NOC: A heterogeneous network-on-chip architecture for scalability and service guarantees , 2011, 2011 38th Annual International Symposium on Computer Architecture (ISCA).

[19]  Shekhar Y. Borkar,et al.  Thousand Core ChipsA Technology Perspective , 2007, 2007 44th ACM/IEEE Design Automation Conference.

[20]  Pedro López,et al.  Reducing Packet Dropping in a Bufferless NoC , 2008, Euro-Par.

[21]  W. Dally,et al.  Route packets, not wires: on-chip interconnection networks , 2001, Proceedings of the 38th Design Automation Conference (IEEE Cat. No.01CH37232).

[22]  Stefano Bregni,et al.  Performance Evaluation of Deflection Routing in Optical IP Packet-Switched Networks , 2004, Cluster Computing.

[23]  Mario R. Casu,et al.  Implementation analysis of NoC: a MPSoC trace-driven approach , 2006, GLSVLSI '06.

[24]  Henry Hoffmann,et al.  On-Chip Interconnection Architecture of the Tile Processor , 2007, IEEE Micro.

[25]  Luiz André Barroso,et al.  Piranha: a scalable architecture based on single-chip multiprocessing , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[26]  A. Snavely,et al.  Symbiotic jobscheduling for a simultaneous mutlithreading processor , 2000, SIGP.

[27]  T. N. Vijaykumar,et al.  Adaptive Flow Control for Robust Performance and Energy , 2010, 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture.

[28]  William J. Dally,et al.  Flattened Butterfly Topology for On-Chip Networks , 2007, 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007).

[29]  William J. Dally,et al.  Principles and Practices of Interconnection Networks , 2004 .

[30]  Kees G. W. Goossens,et al.  Avoiding Message-Dependent Deadlock in Network-Based Systems on Chip , 2007, VLSI Design.

[31]  William J. Dally,et al.  Flattened butterfly: a cost-efficient topology for high-radix networks , 2007, ISCA '07.

[32]  Burton J. Smith Architecture And Applications Of The HEP Multiprocessor Computer System , 1982, Optics & Photonics.

[33]  Xi Wang,et al.  Burst optical deflection routing protocol for wavelength routing WDM networks , 2000, Other Conferences.

[34]  Harish Patil,et al.  Pin: building customized program analysis tools with dynamic instrumentation , 2005, PLDI '05.

[35]  P. Baran,et al.  On Distributed Communications Networks , 1964 .

[36]  Mithuna Thottethodi,et al.  Self-tuned congestion control for multiprocessor networks , 2001, Proceedings HPCA Seventh International Symposium on High-Performance Computer Architecture.

[37]  Shekhar Y. Borkar Future of interconnect fabric: a contrarian view , 2010, SLIP '10.

[38]  M. Suzuoki,et al.  Overview of the architecture, circuit design, and physical implementation of a first-generation cell processor , 2006, IEEE Journal of Solid-State Circuits.

[39]  Srinivasan Seshan,et al.  Congestion Control for Scalability in Bufferless On-Chip Networks , 2011 .

[40]  George Michelogiannakis,et al.  Evaluating Bufferless Flow Control for On-chip Networks , 2010, 2010 Fourth ACM/IEEE International Symposium on Networks-on-Chip.

[41]  D. Lenoski,et al.  The SGI Origin: A ccnuma Highly Scalable Server , 1997, Conference Proceedings. The 24th Annual International Symposium on Computer Architecture.

[42]  S. Konstantinidou,et al.  Chaos router: architecture and performance , 1991, [1991] Proceedings. The 18th Annual International Symposium on Computer Architecture.

[43]  Coniferous softwood GENERAL TERMS , 2003 .

[44]  Chris Fallin,et al.  CHIPPER: A low-complexity bufferless deflection router , 2011, 2011 IEEE 17th International Symposium on High Performance Computer Architecture.

[45]  Kevin Kai-Wei Chang,et al.  Adaptive Cluster Throttling: Improving High-Load Performance in Bufferless On-Chip Networks , 2011 .