论文信息 - An Efficient Implementation of Distributed Routing Algorithms for NoCs

An Efficient Implementation of Distributed Routing Algorithms for NoCs

The design of NoCs for multi-core chips introduces new design constraints like power consumption, area, and ultra low latencies. Although 2D meshes are preferred, heterogeneous blocks, fabrication faults, reliability issues, and chip virtualization may lead to the need of irregular topologies or regions. In this situation, efficient routing becomes a challenge. Although the use of routing tables at switches is flexible, it does not scale in terms of latency and area due to its memory requirements. LBDR (logic-based distributed routing) is proposed as a new routing method that removes the need of using routing tables at all. LBDR enables the implementation of many routing algorithms on most of the practical topologies we might find in the near future in a multi-core system. From an initial topology and routing algorithm, a set of three bits per switch/output port is computed. Evaluation results show that, by using a small logic, LBDR mimics the performance of routing algorithms when implemented with routing tables, both in regular and irregular topologies.

José Duato | Davide Bertozzi | Jose Flich | Simone Medardoni | Samuel Rodrigo

[1] Jan van Leeuwen,et al. Interval Routing , 1987, Computer/law journal.

[2] Federico Angiolini,et al. /spl times/pipes Lite: a synthesis oriented design library for networks on chips , 2005, Design, Automation and Test in Europe.

[3] H. Peter Hofstee,et al. Introduction to the Cell multiprocessor , 2005, IBM J. Res. Dev..

[4] Shekhar Y. Borkar,et al. iWarp: an integrated solution to high-speed parallel computing , 1988, Proceedings. SUPERCOMPUTING '88.

[5] Luca Benini,et al. Exploring High-Dimensional Topologies for NoC Design Through an Integrated Analysis and Synthesis Framework , 2008 .

[6] Pedro López,et al. A memory-effective routing strategy for regular interconnection networks , 2005, 19th IEEE International Parallel and Distributed Processing Symposium.

[7] Trevor Mudge,et al. Razor: a low-power pipeline based on circuit-level timing speculation , 2003, Proceedings. 36th Annual IEEE/ACM International Symposium on Microarchitecture, 2003. MICRO-36..

[8] Anant Agarwal,et al. Scalar operand networks: on-chip interconnect for ILP in partitioned architectures , 2003, The Ninth International Symposium on High-Performance Computer Architecture, 2003. HPCA-9 2003. Proceedings..

[9] Ran Ginosar,et al. Routing Table Minimization for Irregular Mesh NoCs , 2007, 2007 Design, Automation & Test in Europe Conference & Exhibition.

[10] Norman P. Jouppi,et al. Cacti 3. 0: an integrated cache timing, power, and area model , 2001 .

[11] Lionel M. Ni,et al. The Turn Model for Adaptive Routing , 1992, [1992] Proceedings the 19th Annual International Symposium on Computer Architecture.

[12] DAVID GELERNTER. A DAG-Based Algorithm for Prevention of Store-and-Forward Deadlock in Packet Networks , 1981, IEEE Transactions on Computers.

[13] Sriram R. Vangal,et al. A 5-GHz Mesh Interconnect for a Teraflops Processor , 2007, IEEE Micro.

[14] Antonio Robles,et al. A Flexible Routing Scheme for Networks of Workstations , 2000, ISHPC.

[15] Pedro López,et al. Region-Based Routing: An Efficient Routing Mechanism to Tackle Unreliable Hardware in Network on Chips , 2007, First International Symposium on Networks-on-Chip (NOCS'07).

[16] Norman P. Jouppi,et al. WRL Research Report 93/5: An Enhanced Access and Cycle Time Model for On-chip Caches , 1994 .

[17] Shashi Kumar,et al. A Method for Router Table Compression for Application Specific Routing in Mesh Topology NoC Architectures , 2006, SAMOS.

[18] Luca Benini,et al. Error control schemes for on-chip communication links: the energy-reliability tradeoff , 2005, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[19] Luca Benini,et al. Fault Tolerance Overhead in Network-on-Chip Flow Control Schemes , 2005, 2005 18th Symposium on Integrated Circuits and Systems Design.