A Low-Overhead Asynchronous Interconnection Network for GALS Chip Multiprocessors

A new asynchronous interconnection network is introduced for globally-asynchronous locally-synchronous (GALS)chip multiprocessors. The network eliminates the need for global clock distribution, and can interface multiple synchronous timing domains operating at unrelated clock rates. In particular, two new highly-concurrent asynchronous components are introduced which provide simple routing and arbitration/merge functions. Post-layout simulations in identical commercial 90nm technology indicate that comparable recent synchronous router nodes have 5.6-10.7x more energy per packet and 2.8-6.4x greater area than the new asynchronous nodes. Under random traffic, the network provides significantly lower latency and competitive throughput over the entire operating range of the 800 MHz network and through mid-range traffic rates for the 1.36 GHz network, but with degradation at higher traffic rates. Preliminary evaluations are also presented for a mixed-timing (GALS) network in a shared-memory parallel architecture, running both random traffic and parallel benchmark kernels, as well as directions for further improvement.

[1]  Ivan E. Sutherland,et al.  Micropipelines , 1989, Commun. ACM.

[2]  Steven M. Nowick,et al.  MOUSETRAP: High-Speed Transition-Signaling Asynchronous Pipelines , 2007, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[3]  Jan M. Rabaey,et al.  Digital Integrated Circuits: A Design Perspective , 1995 .

[4]  Simon W. Moore,et al.  RasP: An Area-efficient, On-chip Network , 2006, 2006 International Conference on Computer Design.

[5]  William J. Dally,et al.  Research Challenges for On-Chip Interconnection Networks , 2007, IEEE Micro.

[6]  Ran Ginosar,et al.  An asynchronous router for multiple service levels networks on chip , 2005, 11th IEEE International Symposium on Asynchronous Circuits and Systems.

[7]  Mark R. Greenstreet,et al.  Practical Asynchronous Interconnect Network Design , 2008, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[8]  Jens Sparsø,et al.  A router architecture for connection-oriented service guarantees in the MANGO clockless network-on-chip , 2005, Design, Automation and Test in Europe.

[9]  Gang Qu,et al.  Layout-Accurate Design and Implementation of a High-Throughput Interconnection Network for Single-Chip Parallel Processing , 2007 .

[10]  Andrew Lines,et al.  Asynchronous interconnect for synchronous SoC design , 2004, IEEE Micro.

[11]  Uzi Vishkin,et al.  Towards a First Vertical Prototyping of an Extremely Fine-Grained Parallel Programming Approach , 2003, Theory of Computing Systems.

[12]  Uzi Vishkin,et al.  Using Simple Abstraction to Guide the Reinvention of Computing for Parallelism , 2009 .

[13]  Steven M. Nowick,et al.  Robust interfaces for mixed-timing systems , 2004, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[14]  Jiang Zhu,et al.  Building a RCP (Rate Control Protocol) Test Network , 2007 .

[15]  Radu Marculescu,et al.  Key research problems in NoC design: a holistic perspective , 2005, 2005 Third IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS'05).

[16]  Stephen B. Furber,et al.  Chain: A Delay-Insensitive Chip Area Interconnect , 2002, IEEE Micro.

[17]  Luca Benini,et al.  Networks on Chips : A New SoC Paradigm , 2022 .

[18]  Guy Lemieux,et al.  A Survey and Taxonomy of GALS Design Styles , 2007, IEEE Design & Test of Computers.

[19]  Gennette Gill,et al.  Analysis and Optimization for Pipelined Asynchronous Systems , 2010 .

[20]  Gang Qu,et al.  An area-efficient high-throughput hybrid interconnection network for single-chip parallel processing , 2008, 2008 45th ACM/IEEE Design Automation Conference.

[21]  Steven M. Nowick,et al.  Sequential Optimization of Asynchronous and Synchronous Finite-State Machines: Algorithms and Tools , 2001 .

[22]  Michael N. Horak A high-throughput, low-power asynchronous Mesh-of-Trees interconnection network for the eXplicit Multi-Threading (XMT) parallel architecture , 2008 .

[23]  Fabien Clermidy,et al.  A fully-asynchronous low-power framework for GALS NoC integration , 2010, 2010 Design, Automation & Test in Europe Conference & Exhibition (DATE 2010).

[24]  Kenneth S. Stevens,et al.  Comparing Energy and Latency of Asynchronous and Synchronous NoCs for Embedded SoCs , 2010, 2010 Fourth ACM/IEEE International Symposium on Networks-on-Chip.

[25]  Daniel Marcos Chapiro,et al.  Globally-asynchronous locally-synchronous systems , 1985 .

[26]  Uzi Vishkin,et al.  Case study of gate-level logic simulation on an extremely fine-grained chip multiprocessor , 2006, J. Embed. Comput..

[27]  Charles L. Seitz,et al.  A family of routing and communication chips based on the Mosaic , 1993 .

[28]  Erik Brunvand Translating concurrent communicating programs into asynchronous circuits , 1992 .

[29]  Fabien Clermidy,et al.  An asynchronous NOC architecture providing low latency service and its multi-level design framework , 2005, 11th IEEE International Symposium on Asynchronous Circuits and Systems.

[30]  F. Leighton,et al.  Introduction to Parallel Algorithms and Architectures: Arrays, Trees, Hypercubes , 1991 .

[31]  Fabien Clermidy,et al.  A Reconfigurable Baseband Platform Based on an Asynchronous Network-on-Chip , 2008, IEEE Journal of Solid-State Circuits.

[32]  Uzi Vishkin,et al.  Using simple abstraction to reinvent computing for parallelism , 2011, Commun. ACM.

[33]  William J. Dally,et al.  Principles and Practices of Interconnection Networks , 2004 .

[34]  Ran Ginosar,et al.  High Rate Wave-pipelined Asynchronous On-chip Bit-serial Data Link , 2007, 13th IEEE International Symposium on Asynchronous Circuits and Systems (ASYNC'07).

[35]  Steven M. Nowick,et al.  A Level-Encoded Transition Signaling Protocol for High-Throughput Asynchronous Global Communication , 2008, 2008 14th IEEE International Symposium on Asynchronous Circuits and Systems.

[36]  Luis A. Plana,et al.  A GALS Infrastructure for a Massively Parallel Multiprocessor , 2007, IEEE Design & Test of Computers.

[37]  Tomohiro Yoneda,et al.  Comparison of standard cell based non-linear asynchronous pipelines (VLSI設計技術) , 2007 .

[38]  William J. Dally,et al.  The torus routing chip , 2005, Distributed Computing.

[39]  Alain Greiner,et al.  Multisynchronous and Fully Asynchronous NoCs for GALS Architectures , 2008, IEEE Design & Test of Computers.