ElastiStore: Flexible Elastic Buffering for Virtual-Channel-Based Networks on Chip

As multicore systems transition to the many-core realm, the pressure on the interconnection network is substantially elevated. The network on chip (NoC) is expected to undertake the expanding demands of the ever-increasing numbers of processing elements, while its area/power footprint remains severely constrained. Hence, low-cost NoC designs that achieve high-throughput and low-latency operation are imperative for future scalability. While the buffers of the NoC routers are key enablers of high performance, they are also major consumers of area and power. In this paper, we extend elastic buffer (EB) architectures to support multiple virtual channels (VCs), and we derive ElastiStore, a novel lightweight EB architecture that minimizes buffering requirements without sacrificing performance. ElastiStore uses just one register per VC and a shared buffer sized large enough to merely cover the round-trip time that appears either on the NoC links or due to the internal pipeline of the NoC routers. The integration of the proposed EB scheme in the NoC router enables the design of efficient architectures, which offer the same performance as baseline VC-based routers, albeit at a significantly lower cost. Cycle-accurate network simulations including both synthetic traffic patterns and real application workloads running in a full-system simulation framework verify the efficacy of the proposed architecture. Moreover, the hardware implementation results using a 45-nm standard-cell library demonstrate ElastiStore's efficiency.

[1]  Davide Bertozzi,et al.  Improved Utilization of NoC Channel Bandwidth by Switch Replication for Cost-Effective Multi-processor Systems-on-Chip , 2010, 2010 Fourth ACM/IEEE International Symposium on Networks-on-Chip.

[2]  Jih-Sheng Shen,et al.  Network-on-Chip router design with Buffer-Stealing , 2011, 16th Asia and South Pacific Design Automation Conference (ASP-DAC 2011).

[3]  Jordi Cortadella,et al.  Synthesis of synchronous elastic architectures , 2006, 2006 43rd ACM/IEEE Design Automation Conference.

[4]  William J. Dally,et al.  Virtual-channel flow control , 1990, [1990] Proceedings. The 17th Annual International Symposium on Computer Architecture.

[5]  Hannu Tenhunen,et al.  PVS-NoC: Partial Virtual Channel Sharing NoC Architecture , 2011, 2011 19th International Euromicro Conference on Parallel, Distributed and Network-Based Processing.

[6]  Onur Mutlu,et al.  A QoS-Enabled On-Die Interconnect Fabric for Kilo-Node Chips , 2012, IEEE Micro.

[7]  Luca P. Carloni,et al.  Virtual Channels and Multiple Physical Networks: Two Alternatives to Improve NoC Performance , 2013, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[8]  Zeljko Zilic,et al.  Reliability aware NoC router architecture using input channel buffer sharing , 2009, GLSVLSI '09.

[9]  William J. Dally,et al.  Principles and Practices of Interconnection Networks , 2004 .

[10]  Sudhakar Yalamanchili,et al.  Centralized buffer router: A low latency, low power router for high radix NOCs , 2013, 2013 Seventh IEEE/ACM International Symposium on Networks-on-Chip (NoCS).

[11]  Bevan M. Baas,et al.  RoShaQ: High-performance on-chip router with shared queues , 2011, 2011 IEEE 29th International Conference on Computer Design (ICCD).

[12]  Stamatis Vassiliadis,et al.  Design and evaluation of a DAMQ multiprocessor network with self-compacting buffers , 1994, Proceedings of Supercomputing '94.

[13]  Natalie D. Enright Jerger,et al.  Fine-Grained Bandwidth Adaptivity in Networks-on-Chip Using Bidirectional Channels , 2012, 2012 IEEE/ACM Sixth International Symposium on Networks-on-Chip.

[14]  Alberto L. Sangiovanni-Vincentelli,et al.  Coping with Latency in SOC Design , 2002, IEEE Micro.

[15]  Shasi Kumar,et al.  A 2Tb/s 6×4 mesh network with DVFS and 2.3Tb/s/W router in 45nm CMOS , 2010, 2010 Symposium on VLSI Circuits.

[16]  Yuval Tamir,et al.  High-performance multiqueue buffers for VLSI communication switches , 1988, [1988] The 15th Annual International Symposium on Computer Architecture. Conference Proceedings.

[17]  John Kim,et al.  FlexiBuffer: Reducing leakage power in on-chip network routers , 2011, 2011 48th ACM/EDAC/IEEE Design Automation Conference (DAC).

[18]  Sakir Sezer,et al.  Design of interlock-free combined allocators for Networks-on-Chip , 2012, 2012 IEEE International SOC Conference.

[19]  Chita R. Das,et al.  ViChaR: A Dynamic Virtual Channel Regulator for Network-on-Chip Routers , 2006, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).

[20]  Ahmed Louri,et al.  iDEAL: Inter-router Dual-Function Energy and Area-Efficient Links for Network-on-Chip (NoC) Architectures , 2008, 2008 International Symposium on Computer Architecture.

[21]  William J. Dally,et al.  A delay model and speculative architecture for pipelined routers , 2001, Proceedings HPCA Seventh International Symposium on High-Performance Computer Architecture.

[22]  Kai Li,et al.  The PARSEC benchmark suite: Characterization and architectural implications , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[23]  Niraj K. Jha,et al.  GARNET: A detailed on-chip network model inside a full-system simulator , 2009, 2009 IEEE International Symposium on Performance Analysis of Systems and Software.

[24]  Sriram R. Vangal,et al.  A 5-GHz Mesh Interconnect for a Teraflops Processor , 2007, IEEE Micro.

[25]  Mike Galles Spider: a high-speed network interconnect , 1997, IEEE Micro.

[26]  Natalie D. Enright Jerger,et al.  Whole packet forwarding: Efficient design of fully adaptive routing algorithms for networks-on-chip , 2012, IEEE International Symposium on High-Performance Comp Architecture.

[27]  Y. Tamir,et al.  High-performance multi-queue buffers for VLSI communications switches , 1988, ISCA '88.

[28]  George Michelogiannakis,et al.  Elastic-buffer flow control for on-chip networks , 2009, 2009 IEEE 15th International Symposium on High Performance Computer Architecture.

[29]  Luca P. Carloni,et al.  Distributed flit-buffer flow control for networks-on-chip , 2008, CODES+ISSS '08.

[30]  Simon W. Moore,et al.  Low-latency virtual-channel routers for on-chip networks , 2004, Proceedings. 31st Annual International Symposium on Computer Architecture, 2004..

[31]  Milo M. K. Martin,et al.  Multifacet's general execution-driven multiprocessor simulator (GEMS) toolset , 2005, CARN.

[32]  Laxmi N. Bhuyan,et al.  Circular buffered switch design with wormhole routing and virtual channels , 1998, Proceedings International Conference on Computer Design. VLSI in Computers and Processors (Cat. No.98CB36273).

[33]  Federico Silla,et al.  Silicon-aware distributed switch architecture for on-chip networks , 2013, J. Syst. Archit..