Static and dynamic virtual channel allocation for high performance, in-order communication in on-chip networks

Most routers in on-chip interconnection networks (OCINs) have multiple virtual channels (VCs) to mitigate the effects of head-of-line blocking. Multiple VCs necessitate VC allocation schemes since packets or flows must compete for channels when there are more flows than virtual channels at a link. Conventional dynamic VC allocation, however, raises two critical issues. First, it still suffers from a fair amount of head-of-line blocking since all flows can be assigned to any VC within a link. Moreover, dynamic VC allocation compromises the guarantee of in-order delivery even when used with basic variants of dimensionordered routing, requiring large reorder buffers at the destination core or, alternatively, expensive retransmission logic. In this thesis, we present two virtual channel allocation schemes to address these problems: Static Virtual Channel Allocation and Exclusive Dynamic Virtual Channel Allocation (EDVCA). Static VC allocation assigns channels to flows by precomputation when oblivious routing is used, and ensures deadlock freedom for arbitrary minimal routes when two or more VCs are available. EDVCA, on the other hand, is done at runtime, not requiring knowledge of traffic patterns or routes in advance. We demonstrate that both static VCA and EDVCA guarantee in-order packet delivery under single path routing, and furthermore, that they both outperform dynamic VC allocation (out-of-order) by effectively reducing head-of-line blocking. We also introduce a novel bandwidth-sensitive oblivious routing scheme (BSORM), which is deadlock-free through appropriate static VC allocation. Implementation for these schemes requires only minor, inexpensive changes to traditional oblivious dimension-ordered router architectures, more than offset by the removal of packet reorder buffers and logic. Thesis Supervisor: Srinivas Devadas Title: Associate Department Head, Professor

[1]  Arnab Banerjee,et al.  Flow-aware allocation for on-chip networks , 2009, 2009 3rd ACM/IEEE International Symposium on Networks-on-Chip.

[2]  Natalie D. Enright Jerger,et al.  Virtual tree coherence: Leveraging regions and in-network multicast trees for scalable cache coherence , 2008, 2008 41st IEEE/ACM International Symposium on Microarchitecture.

[3]  Omer Khan,et al.  Darsim: A Parallel Cycle-Level NoC Simulator , 2010 .

[4]  Krzysztof Walkowiak,et al.  New Algorithms for the Unsplittable Flow Problem , 2006, ICCSA.

[5]  Ge-Ming Chiu,et al.  The Odd-Even Turn Model for Adaptive Routing , 2000, IEEE Trans. Parallel Distributed Syst..

[6]  Simon W. Moore,et al.  Low-latency virtual-channel routers for on-chip networks , 2004, Proceedings. 31st Annual International Symposium on Computer Architecture, 2004..

[7]  William J. Dally Virtual-channel flow control , 1990, ISCA '90.

[8]  Loren Schwiebert,et al.  Deadlock-free oblivious wormhole routing with cyclic dependencies , 1997, SPAA '97.

[9]  William J. Dally,et al.  GOAL: a load-balanced adaptive routing algorithm for torus networks , 2003, ISCA '03.

[10]  Robert E. Kahn,et al.  A Protocol for Packet Network Intercommunication , 1974 .

[11]  Vinton G. Cerf,et al.  Specification of Internet Transmission Control Program , 1974, RFC.

[12]  Nick McKeown,et al.  The iSLIP scheduling algorithm for input-queued switches , 1999, TNET.

[13]  Akif Ali,et al.  Near-optimal worst-case throughput routing for two-dimensional mesh networks , 2005, 32nd International Symposium on Computer Architecture (ISCA'05).

[14]  Stephen W. Keckler,et al.  Regional congestion awareness for load balance in networks-on-chip , 2008, 2008 IEEE 14th International Symposium on High Performance Computer Architecture.

[15]  William J. Dally,et al.  Principles and Practices of Interconnection Networks , 2004 .

[16]  William J. Dally,et al.  Research Challenges for On-Chip Interconnection Networks , 2007, IEEE Micro.

[17]  William J. Dally,et al.  Globally Adaptive Load-Balanced Routing on Tori , 2004, IEEE Computer Architecture Letters.

[18]  William J. Dally,et al.  Deadlock-Free Message Routing in Multiprocessor Interconnection Networks , 1987, IEEE Transactions on Computers.

[19]  S. Lennart Johnsson,et al.  ROMM Routing: A Class of Efficient Minimal Routing Algorithms , 1994, PCRCW.

[20]  William Thies,et al.  StreamIt: A Language for Streaming Applications , 2002, CC.

[21]  G. Edward Suh,et al.  Application-aware deadlock-free oblivious routing , 2009, ISCA '09.

[22]  Vincenzo Catania,et al.  Design of Bandwidth Aware and Congestion Avoiding Efficient Routing Algorithms for Networks-on-Chip Platforms , 2008 .

[23]  Chita R. Das,et al.  A low latency router supporting adaptivity for on-chip interconnects , 2005, Proceedings. 42nd Design Automation Conference, 2005..

[24]  William J. Dally,et al.  Deadlock-Free Adaptive Routing in Multicomputer Networks Using Virtual Channels , 1993, IEEE Trans. Parallel Distributed Syst..

[25]  S. Lennart Johnsson,et al.  ROMM routing on mesh and torus networks , 1995, SPAA '95.

[26]  Idit Keidar,et al.  NoC-Based FPGA: Architecture and Routing , 2007, First International Symposium on Networks-on-Chip (NOCS'07).

[27]  Chita R. Das,et al.  ViChaR: A Dynamic Virtual Channel Regulator for Network-on-Chip Routers , 2006, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).

[28]  William J. Dally,et al.  A delay model and speculative architecture for pipelined routers , 2001, Proceedings HPCA Seventh International Symposium on High-Performance Computer Architecture.

[29]  G. Edward Suh,et al.  Diastolic arrays: throughput-driven reconfigurable computing , 2008, ICCAD 2008.

[30]  Leslie G. Valiant,et al.  Universal schemes for parallel communication , 1981, STOC '81.

[31]  Lionel M. Ni,et al.  The Turn Model for Adaptive Routing , 1992, [1992] Proceedings the 19th Annual International Symposium on Computer Architecture.

[32]  David A. Patterson,et al.  Computer Architecture: A Quantitative Approach , 1969 .

[33]  Jon Postel,et al.  Transmission Control Protocol , 1981, RFC.

[34]  William J. Dally,et al.  Worst-case Traffic for Oblivious Routing Functions , 2002, IEEE Comput. Archit. Lett..

[35]  Luca Benini,et al.  A multi-path routing strategy with guaranteed in-order packet delivery and fault-tolerance for networks on chip , 2006, 2006 43rd ACM/IEEE Design Automation Conference.