Packet-Switched On-Chip FPGA Overlay Networks

As we scale to larger chip capacities, it becomes possible to map large, concurrent applications to programmable fabrics. These applications often have irregular and dynamic communication requirements. Packet-switched networks provide efficient implementations for such applications on these fabrics. In this research, we show how to engineer high-performance packet-switched on-chip networks and provide quantitative comparisons between different kinds of these networks. We analyse different network topologies and justify selection of topologies based on experimental results. We investigate packet-switched and time-multiplexed styles of routing and provide guidance on which style is appropriate for which application.

[1]  Leslie G. Valiant,et al.  A bridging model for parallel computation , 1990, CACM.

[2]  Steven L. Scott,et al.  The Cray T3E Network: Adaptive Routing in a High Performance 3D Torus , 1996 .

[3]  Charles L. Seitz,et al.  A family of routing and communication chips based on the Mosaic , 1993 .

[4]  Howard Jay Siegel Interconnection Network for Large-Scale Parallel Processing , 1990 .

[5]  Fernando Gehm Moraes,et al.  HERMES: an infrastructure for low area overhead packet-switching networks on chip , 2004, Integr..

[6]  Hoi-Jun Yoo,et al.  Packet-switched on-chip interconnection network for system-on-chip applications , 2005, IEEE Transactions on Circuits and Systems II: Express Briefs.

[7]  John Wawrzynek,et al.  Hardware-assisted fast routing , 2002, Proceedings. 10th Annual IEEE Symposium on Field-Programmable Custom Computing Machines.

[8]  Hugo Liu,et al.  ConceptNet — A Practical Commonsense Reasoning Tool-Kit , 2004 .

[9]  Roy L. Russo,et al.  On a Pin Versus Block Relationship For Partitions of Logic Graphs , 1971, IEEE Transactions on Computers.

[10]  William J. Dally,et al.  Performance Analysis of k-Ary n-Cube Interconnection Networks , 1987, IEEE Trans. Computers.

[11]  W. Daniel Hillis,et al.  The network architecture of the Connection Machine CM-5 (extended abstract) , 1992, SPAA '92.

[12]  William J. Dally,et al.  Topology optimization of interconnection networks , 2006, IEEE Computer Architecture Letters.

[13]  André DeHon,et al.  Balancing interconnect and computation in a reconfigurable computing array (or, why you don't really want 100% LUT utilization) , 1999, FPGA '99.

[14]  Vaughn Betz,et al.  Cluster-based logic blocks for FPGAs: area-efficiency vs. input sharing and size , 1997, Proceedings of CICC 97 - Custom Integrated Circuits Conference.

[15]  Ranga Vemuri,et al.  LiPaR: A light-weight parallel router for FPGA-based networks-on-chip , 2005, ACM Great Lakes Symposium on VLSI.

[16]  George Varghese,et al.  HSRA: high-speed, hierarchical synchronous reconfigurable array , 1999, FPGA '99.

[17]  Howard Jay Siegel,et al.  Interconnection networks for large-scale parallel processing: theory and case studies (2nd ed.) , 1985 .

[18]  Axel Jantsch,et al.  A network on chip architecture and design methodology , 2002, Proceedings IEEE Computer Society Annual Symposium on VLSI. New Paradigms for VLSI Systems Design. ISVLSI 2002.

[19]  Giovanni De Micheli,et al.  Design, synthesis, and test of networks on chips , 2005, IEEE Design & Test of Computers.

[20]  W. Daniel Hillis,et al.  The Network Architecture of the Connection Machine CM-5 , 1996, J. Parallel Distributed Comput..

[21]  Srinivasan Murali,et al.  SUNMAP: a tool for automatic topology selection and generation for NoCs , 2004, Proceedings. 41st Design Automation Conference, 2004..

[22]  Charles L. Seitz,et al.  Mosaic C: An Experimental Fine-Grain Multicomputer , 1992, 25th Anniversary of INRIA.

[23]  Drew Wingard MicroNetwork-based integration for SOCs , 2001, Proceedings of the 38th Design Automation Conference (IEEE Cat. No.01CH37232).

[24]  V. Benes Permutation groups, complexes, and rearrangeable connecting networks , 1964 .

[25]  Rudy Lauwereins,et al.  Run-time support for heterogeneous multitasking on reconfigurable SoCs , 2004, Integr..

[26]  William J. Dally,et al.  Principles and Practices of Interconnection Networks , 2004 .

[27]  Nachiket Kapre,et al.  GraphStep: A System Architecture for Sparse-Graph Algorithms , 2006, 2006 14th Annual IEEE Symposium on Field-Programmable Custom Computing Machines.

[28]  Charles Clos,et al.  A study of non-blocking switching networks , 1953 .

[29]  Michael D. Noakes,et al.  System design of the J-Machine , 1990 .

[30]  Nachiket Kapre,et al.  Design patterns for reconfigurable computing , 2004, 12th Annual IEEE Symposium on Field-Programmable Custom Computing Machines.

[31]  C. Thomborson,et al.  A Complexity Theory for VLSI , 1980 .

[32]  Charles J. Alpert,et al.  The ISPD98 circuit benchmark suite , 1998, ISPD '98.

[33]  André DeHon,et al.  Floating-point sparse matrix-vector multiply for FPGAs , 2005, FPGA '05.

[34]  André DeHon,et al.  Unifying mesh- and tree-based programmable interconnect , 2004, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[35]  F. Leighton,et al.  Introduction to Parallel Algorithms and Architectures: Arrays, Trees, Hypercubes , 1991 .

[36]  V. Benes,et al.  Mathematical Theory of Connecting Networks and Telephone Traffic. , 1966 .

[37]  Charles E. Leiserson,et al.  Fat-trees: Universal networks for hardware-efficient supercomputing , 1985, IEEE Transactions on Computers.

[38]  Andrew B. Kahng,et al.  Improved algorithms for hypergraph bipartitioning , 2000, ASP-DAC '00.

[39]  Alain Greiner,et al.  SPIN: a scalable, packet switched, on-chip micro-network , 2003, 2003 Design, Automation and Test in Europe Conference and Exhibition.

[40]  Luca Benini,et al.  Networks on Chips : A New SoC Paradigm , 2022 .

[41]  Sujit Dey,et al.  An Interconnect Architecture for Networking Systems on Chips , 2002, IEEE Micro.

[42]  Axel Jantsch,et al.  The Nostrum backbone-a communication protocol stack for Networks on Chip , 2004, 17th International Conference on VLSI Design. Proceedings..

[43]  Timothy Mark Pinkston,et al.  A design methodology for efficient application-specific on-chip interconnects , 2006, IEEE Transactions on Parallel and Distributed Systems.

[44]  William J. Dally,et al.  The torus routing chip , 2005, Distributed Computing.

[45]  Nachiket Kapre,et al.  Packet Switched vs. Time Multiplexed FPGA Overlay Networks , 2006, 2006 14th Annual IEEE Symposium on Field-Programmable Custom Computing Machines.

[46]  Charles L. Seitz,et al.  Let's route packets instead of wires , 1990 .

[47]  Marc Snir,et al.  The Performance of Multistage Interconnection Networks for Multiprocessors , 1983, IEEE Transactions on Computers.

[48]  Sudhakar Yalamanchili,et al.  Interconnection Networks: An Engineering Approach , 2002 .

[49]  Charles E. Leiserson,et al.  Optimizing Synchronous Circuitry by Retiming (Preliminary Version) , 1983 .

[50]  Shubhendu S. Mukherjee,et al.  The Alpha 21364 network architecture , 2001, HOT 9 Interconnects. Symposium on High Performance Interconnects.

[51]  Carl Ebeling,et al.  PathFinder: A Negotiation-Based Performance-Driven Router for FPGAs , 1995, Third International ACM Symposium on Field-Programmable Gate Arrays.

[52]  W. Dally,et al.  Route packets, not wires: on-chip interconnection networks , 2001, Proceedings of the 38th Design Automation Conference (IEEE Cat. No.01CH37232).

[53]  Partha Pratim Pande,et al.  Performance evaluation and design trade-offs for network-on-chip interconnect architectures , 2005, IEEE Transactions on Computers.

[54]  Vaughn Betz,et al.  Architecture and CAD for Deep-Submicron FPGAS , 1999, The Springer International Series in Engineering and Computer Science.