Approaching Ideal NoC Latency with Pre-Configured Routes

In multi-core ASICs, processors and other compute engines need to communicate with memory blocks and other cores with latency as close as possible to the ideal of a direct buffered wire. However, current state of the art networks-on-chip (NoCs) suffer, at best, latency of one clock cycle per hop. We investigate the design of a NoC that offers close to the ideal latency in some preferred, run-time configurable paths. Processors and other compute engines may perform network reconfiguration to guarantee low latency over different sets of paths as needed. Flits in non-preferred paths are given lower priority than flits in preferred ones, and suffer a delay of one clock cycle per hop when there is no contention. To achieve our goal, we use the "mad-postman" technique: every incoming flit is eagerly (i.e. speculatively) forwarded to the input's preferred output, if any. This is accomplished with the mere delay of a single pre-enabled tri-state driver. We later check if that decision was correct, and if not, we forward the flit to the proper output. Incorrectly forwarded flits are classified as dead and eliminated in later hops. We use a 2D mesh topology tailored for processor-memory communication, and a modified version of XY routing that remains deadlock-free. Performance gains are significant and can be proven greatly useful in other application domains as well

[1]  William J. Dally,et al.  Express Cubes: Improving the Performance of k-Ary n-Cube Interconnection Networks , 1989, IEEE Trans. Computers.

[2]  Kiyoung Choi,et al.  Proceedings of the 4th international conference on Hardware/software codesign and system synthesis , 2006 .

[3]  Simon W. Moore,et al.  Low-latency virtual-channel routers for on-chip networks , 2004, Proceedings. 31st Annual International Symposium on Computer Architecture, 2004..

[4]  William J. Dally,et al.  Flit-reservation flow control , 2000, Proceedings Sixth International Symposium on High-Performance Computer Architecture. HPCA-6 (Cat. No.PR00550).

[5]  William J. Dally,et al.  Route packets, not wires: on-chip inteconnection networks , 2001, DAC '01.

[6]  Mahmut T. Kandemir,et al.  Design and Management of 3D Chip Multiprocessors Using Network-in-Memory , 2006, 33rd International Symposium on Computer Architecture (ISCA'06).

[7]  Anant Agarwal,et al.  Scalar operand networks , 2005, IEEE Transactions on Parallel and Distributed Systems.

[8]  Ioannis Papaefstathiou,et al.  Variable packet size buffered crossbar (CICQ) switches , 2004, 2004 IEEE International Conference on Communications (IEEE Cat. No.04CH37577).

[9]  Sharad Malik,et al.  A technology-aware and energy-oriented topology exploration for on-chip networks , 2005, Design, Automation and Test in Europe.

[10]  Chita R. Das,et al.  A low latency router supporting adaptivity for on-chip interconnects , 2005, Proceedings. 42nd Design Automation Conference, 2005..

[11]  Reinaldo A. Bergamaschi,et al.  Designing systems-on-chip using cores , 2000, DAC.

[12]  Luca Benini,et al.  A multi-path routing strategy with guaranteed in-order packet delivery and fault-tolerance for networks on chip , 2006, 2006 43rd ACM/IEEE Design Automation Conference.

[13]  Jürgen Teich,et al.  DyNoC: A dynamic infrastructure for communication in dynamically reconfugurable devices , 2005, International Conference on Field Programmable Logic and Applications, 2005..

[14]  Radu Marculescu Networks-on-chip: the quest for on-chip fault-tolerant communication , 2003, IEEE Computer Society Annual Symposium on VLSI, 2003. Proceedings..

[15]  Li Shang,et al.  Thermal Modeling, Characterization and Management of On-Chip Networks , 2004, 37th International Symposium on Microarchitecture (MICRO-37'04).

[16]  Ken Mai,et al.  The future of wires , 2001, Proc. IEEE.

[17]  Stamatis Vassiliadis,et al.  Reconfigurable FLUX networks , 2006, 2006 IEEE International Conference on Field Programmable Technology.

[18]  Lionel M. Ni,et al.  A survey of wormhole routing techniques in direct networks , 1993, Computer.

[19]  Chita R. Das,et al.  ViChaR: A Dynamic Virtual Channel Regulator for Network-on-Chip Routers , 2006, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).

[20]  Chris R. Jesshope,et al.  High Performance Communications In Processor Networks , 1989, The 16th Annual International Symposium on Computer Architecture.

[21]  Vincenzo Catania,et al.  A methodology for design of application specific deadlock-free routing algorithms for NoC systems , 2006, Proceedings of the 4th International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS '06).

[22]  Partha Pratim Pande,et al.  Performance evaluation and design trade-offs for network-on-chip interconnect architectures , 2005, IEEE Transactions on Computers.

[23]  Dean M. Tullsen,et al.  Interconnections in Multi-Core Architectures: Understanding Mechanisms, Overheads and Scaling , 2005, ISCA 2005.

[24]  Sin-Chong Park,et al.  Adaptive routing scheme for NoC communication architecture , 2005, The 7th International Conference on Advanced Communication Technology, 2005, ICACT 2005..

[25]  William J. Dally,et al.  Principles and Practices of Interconnection Networks , 2004 .

[26]  Thilo Pionteck,et al.  Applying Partial Reconfiguration to Networks-On-Chips , 2006, 2006 International Conference on Field Programmable Logic and Applications.

[27]  Chita R. Das,et al.  A Gracefully Degrading and Energy-Efficient Modular Router Architecture for On-Chip Networks , 2006, 33rd International Symposium on Computer Architecture (ISCA'06).

[28]  Kunle Olukotun,et al.  The case for a single-chip multiprocessor , 1996, ASPLOS VII.

[29]  Rudy Lauwereins,et al.  Design, Automation, and Test in Europe , 2008 .

[30]  Luciano Lavagno,et al.  Coping with the variability of combinational logic delays , 2004, IEEE International Conference on Computer Design: VLSI in Computers and Processors, 2004. ICCD 2004. Proceedings..

[31]  Radu Marculescu,et al.  DyAD - smart routing for networks-on-chip , 2004, Proceedings. 41st Design Automation Conference, 2004..

[32]  R. Beivide,et al.  Mad-postman : A Look-ahead Message Propagation Method For Static Bidimensional Meshes , 1994, Proceedings. Second Euromicro Workshop on Parallel and Distributed Processing.

[33]  Fabien Clermidy,et al.  An asynchronous NOC architecture providing low latency service and its multi-level design framework , 2005, 11th IEEE International Symposium on Asynchronous Circuits and Systems.

[34]  Radu Marculescu,et al.  Towards on-chip fault-tolerant communication , 2003, ASP-DAC '03.