A high-density data path implementation fitting for HTC applications

High Throughput Computing (HTC) applications become the new loadings with the rapid rising of web services. In HTC applications, as we observed, a significant proportion of memory accesses are in small granularity, such as 1B or 2B. However, the link width is usually designed as 128 bits or even larger to achieve high throughput in traditional NoCs. The entire bandwidth is occupied no matter how large the fiit is. Therefore, using traditional NoCs for HTC applications will lead to the waste of bandwidth. In this paper, to address the above-mentioned problem, we proposed High-Density NoC (HD-NoC). In HD-NoC, traditional link is split into several narrow channels, such as 8 or 16 bits. If the slice is 16 bits wide, there will be 8 or more separately self-governed small channels running simultaneously in one direction. Cooperating with our Greedy Transfer Mechanism (GTM), flits in the same direction can be transferred parallel, which will alleviate the congestion and improve effective utilization of bandwidth. Experiments show that for HTC applications, our proposed HD-NoC improves throughput rate by 22.2% in average and 32.4% for Grep application with little extra hardware resources. The HD-NoC is also able to improve throughput rate by 13.5% for traditional SPLASH-2 benchmarks.

[1]  Yingtao Jiang,et al.  On self-tuning networks-on-chip for dynamic network-flow dominance adaptation , 2013, 2013 Seventh IEEE/ACM International Symposium on Networks-on-Chip (NoCS).

[2]  José Duato,et al.  An Efficient Switching Technique for NoCs with Reduced Buffer Requirements , 2008, 2008 14th IEEE International Conference on Parallel and Distributed Systems.

[3]  Chita R. Das,et al.  Design and evaluation of a hierarchical on-chip interconnect for next-generation CMPs , 2009, 2009 IEEE 15th International Symposium on High Performance Computer Architecture.

[4]  Ali El-Moursy,et al.  Traffic-based virtual channel activation for low-power NoC , 2013, 2013 8th IEEE Design and Test Symposium.

[5]  Gul N. Khan,et al.  Efficient Dynamic Virtual Channel Organization and Architecture for NoC Systems , 2016, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[6]  John Jose,et al.  Smart Port Allocation for Adaptive NoC Routers , 2015, 2015 28th International Conference on VLSI Design.

[7]  Pedro López,et al.  Exploiting Wiring Resources on Interconnection Network: Increasing Path Diversity , 2008, 16th Euromicro Conference on Parallel, Distributed and Network-Based Processing (PDP 2008).

[8]  Diederik Verkest,et al.  Spatial division multiplexing: a novel approach for guaranteed throughput on NoCs , 2005, 2005 Third IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS'05).

[9]  Dongrui Fan,et al.  SimICT: A fast and flexible framework for performance and power evaluation of large-scale architecture , 2013, International Symposium on Low Power Electronics and Design (ISLPED).

[10]  Ninghui Sun,et al.  High Volume Computing : Identifying and Characterizing Throughput Oriented Workloads in Data Centers , 2013 .

[11]  Srinivas Devadas,et al.  Oblivious Routing in On-Chip Bandwidth-Adaptive Networks , 2009, 2009 18th International Conference on Parallel Architectures and Compilation Techniques.

[12]  Natalie D. Enright Jerger,et al.  Dodec: Random-Link, Low-Radix On-Chip Networks , 2014, 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture.

[13]  Ki Hwan Yum,et al.  APCR: An adaptive physical channel regulator for On-Chip Interconnects , 2012, 2012 21st International Conference on Parallel Architectures and Compilation Techniques (PACT).

[14]  Chunjie Luo,et al.  High Volume Throughput Computing: Identifying and Characterizing Throughput Oriented Workloads in Data Centers , 2012, 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum.

[15]  Yi Liang,et al.  In Cloud, Can Scientific Communities Benefit from the Economies of Scale? , 2010, IEEE Transactions on Parallel and Distributed Systems.

[16]  Chita R. Das,et al.  A case for heterogeneous on-chip interconnects for CMPs , 2011, 2011 38th Annual International Symposium on Computer Architecture (ISCA).

[17]  Natalie D. Enright Jerger,et al.  Fine-Grained Bandwidth Adaptivity in Networks-on-Chip Using Bidirectional Channels , 2012, 2012 IEEE/ACM Sixth International Symposium on Networks-on-Chip.

[18]  Yu Hen Hu,et al.  BiNoC: A bidirectional NoC architecture with dynamic self-reconfigurable channel , 2009, 2009 3rd ACM/IEEE International Symposium on Networks-on-Chip.

[19]  Peter Marwedel,et al.  Scratchpad memory: a design alternative for cache on-chip memory in embedded systems , 2002, Proceedings of the Tenth International Symposium on Hardware/Software Codesign. CODES 2002 (IEEE Cat. No.02TH8627).

[20]  Ajay Joshi,et al.  Run-time energy management of manycore systems through reconfigurable interconnects , 2011, GLSVLSI '11.