A Quantitative Study of the On-Chip Network and Memory Hierarchy Design for Many-Core Processor

In this paper, we will study the on-chip network and memory hierarchy design of the Godson-T - a homogeneous many-core processor. Godson-T has 64 cores (with private L1 cache), and 16 global L2 cache banks. All these on-chip units are connected by a 2D 8 × 8 mesh network. Our study reveals that:(a) Global on-chip L2 cache can effectively alleviate the memory pressure caused by the data-thirsty on-chip computing engines. However, its potential is still limited by both the off-chip and the in-chip bandwidth, especially when increasing the number of active threads.(b) On-chip traffic congestion is largely caused by the intensive memory access requests issued from the on-chipcores. Therefore, the design of the on-chip network must consider the available performance of the datapath that connects the processor to the main memory. (c) In theory, different applications have different communication patterns (Berkeley's view). However, the application's runtime communication pattern is only determined by the design of the underlying memory hierarchy and on-chip interconnection. These conclusions are generally applicable to a wide variety of many-core processors with similar design.

[1]  Vivek Sarkar,et al.  Location Consistency-A New Memory Model and Cache Consistency Protocol , 2000, IEEE Trans. Computers.

[2]  Jianwen Chen,et al.  Throughput reduction due to non-uniform traffic in a packet switch with input and output queueing , 1991, ICC 91 International Conference on Communications Conference Record.

[3]  Henry Hoffmann,et al.  On-Chip Interconnection Architecture of the Tile Processor , 2007, IEEE Micro.

[4]  Guang R. Gao,et al.  Toward a Software Infrastructure for the Cyclops-64 Cellular Architecture , 2006, 20th International Symposium on High-Performance Computing in an Advanced Collaborative Environment (HPCS'06).

[5]  Anoop Gupta,et al.  The SPLASH-2 programs: characterization and methodological considerations , 1995, ISCA.

[6]  Sriram R. Vangal,et al.  A 5-GHz Mesh Interconnect for a Teraflops Processor , 2007, IEEE Micro.

[7]  William J. Dally,et al.  Principles and Practices of Interconnection Networks , 2004 .

[8]  Samuel Williams,et al.  The Landscape of Parallel Computing Research: A View from Berkeley , 2006 .

[9]  Li-Shiuan Peh,et al.  Guest Editors' Introduction: On-Chip Interconnects for Multicores , 2007, IEEE Micro.

[10]  Natalie D. Enright Jerger,et al.  Virtual Circuit Tree Multicasting: A Case for On-Chip Hardware Multicast Support , 2008, 2008 International Symposium on Computer Architecture.

[11]  Doug Burger,et al.  Implementation and Evaluation of On-Chip Network Architectures , 2006, 2006 International Conference on Computer Design.

[12]  Guang R. Gao,et al.  Experience on optimizing irregular computation for memory hierarchy in manycore architecture , 2008, ACM SIGPLAN Symposium on Principles & Practice of Parallel Programming.

[13]  Huang He Architecture Supported Synchronization-Based Cache Coherence Protocol for Many-Core Processors , 2009 .

[14]  Alan L. Cox,et al.  Lazy release consistency for software distributed shared memory , 1992, ISCA '92.

[15]  Lizy Kurian John,et al.  Scaling to the end of silicon with EDGE architectures , 2004, Computer.

[16]  R. J. Cyper Communications architecture for distributed systems , 1978 .

[17]  David A. Patterson,et al.  Computer Architecture: A Quantitative Approach , 1969 .