Footprint: Regulating routing adaptiveness in Networks-on-Chip

Routing algorithms can improve network performance by maximizing routing adaptiveness but can be problematic in the presence of endpoint congestion. Tree-saturation is a well-known behavior caused by endpoint congestion. Adaptive routing can, however, spread the congestion and result in thick branches of the congestion tree — creating Head-of-Line (HoL) blocking and degrading performance. In this work, we identify how ignoring virtual channels (VCs) and their occupancy during adaptive routing results in congestion trees with thick branches as congestion is spread to all VCs. To address this limitation, we propose Footprint routing algorithm — a new adaptive routing algorithm that minimizes the size of the congestion tree, both in terms of the number of nodes in the congestion tree as well as branch thickness. Footprint achieves this by regulating adaptiveness by requiring packets to follow the path of prior packets to the same destination if the network is congested instead of forking a new path or VC. Thus, the congestion tree is dynamically kept as slim as possible and reduces HoL blocking or congestion spreading while maintaining high adaptivity and maximizing VC buffer utilization. We evaluate the proposed Footprint routing algorithm against other adaptive routing algorithms and our simulation results show that the network saturation throughput can be improved by up to 43% (58%) compared with the fully adaptive routing (partially adaptive routing) algorithms.

[1]  John Kim,et al.  Contention-based congestion management in large-scale networks , 2016, 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[2]  John Kim,et al.  Overcoming far-end congestion in large-scale networks , 2015, 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA).

[3]  José Duato,et al.  HoL-Blocking Avoidance Routing Algorithms in Direct Topologies , 2014, 2014 IEEE Intl Conf on High Performance Computing and Communications, 2014 IEEE 6th Intl Symp on Cyberspace Safety and Security, 2014 IEEE 11th Intl Conf on Embedded Software and Syst (HPCC,CSS,ICESS).

[4]  Ying Gao,et al.  SurfNoC: a low latency and provably non-interfering approach to secure networks-on-chip , 2013, ISCA.

[5]  Nan Jiang,et al.  A detailed and flexible cycle-accurate Network-on-Chip simulator , 2013, 2013 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).

[6]  Henri Casanova,et al.  A case for random shortcut topologies for HPC interconnects , 2012, 2012 39th Annual International Symposium on Computer Architecture (ISCA).

[7]  Nan Jiang,et al.  Network congestion avoidance through Speculative Reservation , 2012, IEEE International Symposium on High-Performance Comp Architecture.

[8]  Natalie D. Enright Jerger,et al.  DBAR: An efficient routing algorithm to support multiple concurrent applications in networks-on-chip , 2011, 2011 38th Annual International Symposium on Computer Architecture (ISCA).

[9]  Huawei Li,et al.  An abacus turn model for time/space-efficient reconfigurable routing , 2011, 2011 38th Annual International Symposium on Computer Architecture (ISCA).

[10]  José Duato,et al.  Buffer Management Strategies to Reduce HoL Blocking , 2010, IEEE Transactions on Parallel and Distributed Systems.

[11]  Onur Mutlu,et al.  Preemptive Virtual Clock: A flexible, efficient, and cost-effective QOS scheme for networks-on-chip , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[12]  Lei Liu,et al.  Godson-T: An Efficient Many-Core Architecture for Parallel Program Executions , 2009, Journal of Computer Science and Technology.

[13]  Nan Jiang,et al.  Indirect adaptive routing on large scale interconnection networks , 2009, ISCA '09.

[14]  Kai Li,et al.  The PARSEC benchmark suite: Characterization and architectural implications , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[15]  Axel Jantsch,et al.  TDM Virtual-Circuit Configuration for Network-on-Chip , 2008, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[16]  William J. Dally,et al.  Technology-Driven, Highly-Scalable Dragonfly Topology , 2008, 2008 International Symposium on Computer Architecture.

[17]  Krste Asanovic,et al.  Globally-Synchronized Frames for Guaranteed Quality-of-Service in On-Chip Networks , 2008, 2008 International Symposium on Computer Architecture.

[18]  Stephen W. Keckler,et al.  Regional congestion awareness for load balance in networks-on-chip , 2008, 2008 IEEE 14th International Symposium on High Performance Computer Architecture.

[19]  S. Borkar,et al.  An 80-Tile Sub-100-W TeraFLOPS Processor in 65-nm CMOS , 2008, IEEE Journal of Solid-State Circuits.

[20]  William J. Dally,et al.  Flattened butterfly: a cost-efficient topology for high-radix networks , 2007, ISCA '07.

[21]  William J. Dally,et al.  Adaptive Routing in High-Radix Clos Network , 2006, ACM/IEEE SC 2006 Conference (SC'06).

[22]  José Duato,et al.  A new scalable and cost-effective congestion management strategy for lossless multistage interconnection networks , 2005, 11th International Symposium on High-Performance Computer Architecture.

[23]  James R. Goodman,et al.  Billion-transistor architectures: there and back again , 2004, Computer.

[24]  Henry Hoffmann,et al.  The Raw Microprocessor: A Computational Fabric for Software Circuits and General-Purpose Programs , 2002, IEEE Micro.

[25]  David L. Black,et al.  The Addition of Explicit Congestion Notification (ECN) to IP , 2001, RFC.

[26]  Ge-Ming Chiu,et al.  The Odd-Even Turn Model for Adaptive Routing , 2000, IEEE Trans. Parallel Distributed Syst..

[27]  Jean C. Walrand,et al.  Achieving 100% throughput in an input-queued switch , 1996, Proceedings of IEEE INFOCOM '96. Conference on Computer Communications.

[28]  José Duato,et al.  A New Theory of Deadlock-Free Adaptive Routing in Wormhole Networks , 1993, IEEE Trans. Parallel Distributed Syst..

[29]  William J. Dally,et al.  Deadlock-Free Adaptive Routing in Multicomputer Networks Using Virtual Channels , 1993, IEEE Trans. Parallel Distributed Syst..

[30]  K. Ramakrishnan,et al.  A binary feedback scheme for congestion avoidance in computer networks , 1990, TOCS.

[31]  Gregory F. Pfister,et al.  “Hot spot” contention and combining in multistage interconnection networks , 1985, IEEE Transactions on Computers.

[32]  David Wentzlaff,et al.  Processor: A 64-Core SoC with Mesh Interconnect , 2010 .

[33]  William J. Dally,et al.  Microarchitecture of a high radix router , 2005, 32nd International Symposium on Computer Architecture (ISCA'05).

[34]  William J. Dally,et al.  Principles and Practices of Interconnection Networks , 2004 .

[35]  Lionel M. Ni,et al.  The turn model for adaptive routing , 1998, ISCA '98.