Uniform Minimal First: Latency Reduction in Throughput-Optimal Oblivious Routing for Mesh-Based Networks-on-Chip

Mesh-based networks-on-chips (NoCs) are increasingly used for on-chip communications in embedded multicore processors. O1TURN is a well-known oblivious routing algorithm for mesh-based NoCs that has been shown to be worst-case throughput optimal for even network radices, but not for odd radices. More recently, another oblivious routing algorithm called U2TURN has been shown to be worst-case throughput optimal for both odd and even radices. This is accomplished by load-balancing among 2-turn paths in XYX or YXY routing, including nonminimal paths. Besides being worst-case throughput optimal, U2TURN achieves higher throughput than O1TURN under adversarial traffic. However, for random traffic, where the traffic is inherently load-balanced, O1TURN achieves lower network latency since it only considers minimal paths. In this letter, we propose a hybrid oblivious routing algorithm called uniform-minimal-first (UMF) routing. UMF works by exploiting any inherent load-balancing characteristics of the traffic pattern to reduce packet latency, but it retains the throughput optimality of U2TURN for both odd and even radices, and it achieves the performance of U2TURN under adversarial traffic. UMF is very inexpensive to implement as it just requires incrementing two small counters, but only for the head flit when a node injects a new packet (not for any pass-through traffic). These simple updates can be performed one cycle ahead of packet injection to avoid slowing down any router pipeline stage. Despite its simplicity, UMF outperforms U2TURN by 23.7% under random traffic and O1TURN by 12.6% under adversarial traffic.

[1]  Sriram R. Vangal,et al.  A 2 Tb/s 6 × 4 Mesh Network for a Single-Chip Cloud Computer With DVFS in 45 nm CMOS , 2011, VLSIC 2011.

[2]  Martin Raab,et al.  "Balls into Bins" - A Simple and Tight Analysis , 1998, RANDOM.

[3]  Theodore R. Bashkow,et al.  A large scale, homogeneous, fully distributed parallel machine, I , 1977, ISCA '77.

[4]  Leslie G. Valiant,et al.  Universal schemes for parallel communication , 1981, STOC '81.

[5]  Li Shang,et al.  Dynamic voltage scaling with links for power optimization of interconnection networks , 2003, The Ninth International Symposium on High-Performance Computer Architecture, 2003. HPCA-9 2003. Proceedings..

[6]  Akif Ali,et al.  Near-optimal worst-case throughput routing for two-dimensional mesh networks , 2005, 32nd International Symposium on Computer Architecture (ISCA'05).

[7]  William J. Dally,et al.  Principles and Practices of Interconnection Networks , 2004 .

[8]  Guang Sun,et al.  Oblivious routing design for mesh networks to achieve a new worst-case throughput bound , 2012, 2012 IEEE 30th International Conference on Computer Design (ICCD).

[9]  Chen Sun,et al.  DSENT - A Tool Connecting Emerging Photonics with Electronics for Opto-Electronic Networks-on-Chip Modeling , 2012, 2012 IEEE/ACM Sixth International Symposium on Networks-on-Chip.