A Low-Overhead Asynchronous Interconnection Network for GALS Chip Multiprocessors

A new asynchronous interconnection network is introduced for globally-asynchronous locally-synchronous (GALS) chip multiprocessors. The network eliminates the need for global clock distribution, and can interface multiple synchronous timing domains operating at unrelated clock rates. In particular, two new highly-concurrent asynchronous components are introduced which provide simple routing and arbitration/merge functions. Post-layout simulations in identical commercial 90 nm technology indicate that comparable recent synchronous router nodes have 5.6-10.7 more energy per packet and 2.8-6.4 greater area than the new asynchronous nodes. Under random traffic, the network provides significantly lower latency and identical throughput over the entire operating range of the 800 MHz network and through mid-range traffic rates for the 1.36 GHz network, but with degradation at higher traffic rates. Preliminary evaluations are also presented for a mixed-timing (GALS) network in a shared-memory parallel architecture, running both random traffic and parallel benchmark kernels, as well as directions for further improvement.

[1]  Uzi Vishkin,et al.  Using simple abstraction to reinvent computing for parallelism , 2011, Commun. ACM.

[2]  William J. Dally,et al.  Principles and Practices of Interconnection Networks , 2004 .

[3]  William J. Dally,et al.  The torus routing chip , 2005, Distributed Computing.

[4]  Luis A. Plana,et al.  A GALS Infrastructure for a Massively Parallel Multiprocessor , 2007, IEEE Design & Test of Computers.

[5]  Tomohiro Yoneda,et al.  Comparison of standard cell based non-linear asynchronous pipelines (VLSI設計技術) , 2007 .

[6]  Radu Marculescu,et al.  Key research problems in NoC design: a holistic perspective , 2005, 2005 Third IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS'05).

[7]  Steven M. Nowick,et al.  A Level-Encoded Transition Signaling Protocol for High-Throughput Asynchronous Global Communication , 2008, 2008 14th IEEE International Symposium on Asynchronous Circuits and Systems.

[8]  Ran Ginosar,et al.  High Rate Wave-pipelined Asynchronous On-chip Bit-serial Data Link , 2007, 13th IEEE International Symposium on Asynchronous Circuits and Systems (ASYNC'07).

[9]  Fabien Clermidy,et al.  A fully-asynchronous low-power framework for GALS NoC integration , 2010, 2010 Design, Automation & Test in Europe Conference & Exhibition (DATE 2010).

[10]  Kenneth S. Stevens,et al.  Comparing Energy and Latency of Asynchronous and Synchronous NoCs for Embedded SoCs , 2010, 2010 Fourth ACM/IEEE International Symposium on Networks-on-Chip.

[11]  Stephen B. Furber,et al.  Chain: A Delay-Insensitive Chip Area Interconnect , 2002, IEEE Micro.

[12]  Gennette Gill,et al.  Analysis and Optimization for Pipelined Asynchronous Systems , 2010 .

[13]  Erik Brunvand Translating concurrent communicating programs into asynchronous circuits , 1992 .

[14]  Ran Ginosar,et al.  An asynchronous router for multiple service levels networks on chip , 2005, 11th IEEE International Symposium on Asynchronous Circuits and Systems.

[15]  Daniel Marcos Chapiro,et al.  Globally-asynchronous locally-synchronous systems , 1985 .

[16]  Steven M. Nowick,et al.  Robust interfaces for mixed-timing systems , 2004, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[17]  Jiang Zhu,et al.  Building a RCP (Rate Control Protocol) Test Network , 2007 .

[18]  Uzi Vishkin,et al.  Towards a first vertical prototyping of an extremely fine-grained parallel programming approach , 2001, SPAA '01.

[19]  Uzi Vishkin,et al.  A Low-Overhead Asynchronous Interconnection Network for GALS Chip Multiprocessors , 2011, 2010 Fourth ACM/IEEE International Symposium on Networks-on-Chip.

[20]  Jens Sparsø,et al.  A router architecture for connection-oriented service guarantees in the MANGO clockless network-on-chip , 2005, Design, Automation and Test in Europe.

[21]  Uzi Vishkin,et al.  Case study of gate-level logic simulation on an extremely fine-grained chip multiprocessor , 2006, J. Embed. Comput..

[22]  Charles L. Seitz,et al.  A family of routing and communication chips based on the Mosaic , 1993 .

[23]  Alain Greiner,et al.  Multisynchronous and Fully Asynchronous NoCs for GALS Architectures , 2008, IEEE Design & Test of Computers.

[24]  Uzi Vishkin,et al.  Using Simple Abstraction to Guide the Reinvention of Computing for Parallelism , 2009 .

[25]  Luca Benini,et al.  Networks on Chips : A New SoC Paradigm , 2022 .

[26]  Guy Lemieux,et al.  A Survey and Taxonomy of GALS Design Styles , 2007, IEEE Design & Test of Computers.

[27]  Steven M. Nowick,et al.  Sequential Optimization of Asynchronous and Synchronous Finite-State Machines: Algorithms and Tools , 2001 .

[28]  Michael N. Horak A high-throughput, low-power asynchronous Mesh-of-Trees interconnection network for the eXplicit Multi-Threading (XMT) parallel architecture , 2008 .

[29]  Fabien Clermidy,et al.  An asynchronous NOC architecture providing low latency service and its multi-level design framework , 2005, 11th IEEE International Symposium on Asynchronous Circuits and Systems.

[30]  F. Leighton,et al.  Introduction to Parallel Algorithms and Architectures: Arrays, Trees, Hypercubes , 1991 .

[31]  William J. Dally,et al.  Research Challenges for On-Chip Interconnection Networks , 2007, IEEE Micro.

[32]  Ivan E. Sutherland,et al.  Micropipelines , 1989, Commun. ACM.

[33]  Jan M. Rabaey,et al.  Digital Integrated Circuits: A Design Perspective , 1995 .

[34]  Fabien Clermidy,et al.  A Reconfigurable Baseband Platform Based on an Asynchronous Network-on-Chip , 2008, IEEE Journal of Solid-State Circuits.

[35]  Gang Qu,et al.  Layout-Accurate Design and Implementation of a High-Throughput Interconnection Network for Single-Chip Parallel Processing , 2007 .

[36]  Steven M. Nowick,et al.  MOUSETRAP: High-Speed Transition-Signaling Asynchronous Pipelines , 2007, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[37]  Mark R. Greenstreet,et al.  Practical Asynchronous Interconnect Network Design , 2008, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[38]  Andrew Lines,et al.  Asynchronous interconnect for synchronous SoC design , 2004, IEEE Micro.

[39]  Uzi Vishkin,et al.  Towards a First Vertical Prototyping of an Extremely Fine-Grained Parallel Programming Approach , 2003, Theory of Computing Systems.

[40]  Simon W. Moore,et al.  RasP: An Area-efficient, On-chip Network , 2006, 2006 International Conference on Computer Design.

[41]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[42]  Gang Qu,et al.  An area-efficient high-throughput hybrid interconnection network for single-chip parallel processing , 2008, 2008 45th ACM/IEEE Design Automation Conference.

[43]  Mannakkara Chammika,et al.  Comparison of standard cell based non-linear asynchronous pipelines (システムLSI設計技術・デザインガイア2007--VLSI設計の新しい大地を考える研究会) , 2007 .