CSquare: A new kilo-core-oriented topology

Abstract As the number of cores in a multicore chip increases, the kilo-core processor will be a trend in Network-on-Chip development. For such case, the network topology needs to scale effectively. In this paper, we propose a new scalable topology for kilo-core-oriented processors named CSquare, a cluster-formed structure, in which each cluster is a variation of the butterfly-tree. For each cluster, the routers adopt parallel-level structure to obtain efficient global connections. With the global interconnections between clusters, the topology scale can be extended at a much faster rate than with only one cluster. For such characters, we propose a deadlock-free routing algorithm for CSquare, called GNCA. Compared with 2D-Mesh and Torus, CSquare employs fewer routers and links. The simulation results indicate that CSquare greatly improves the average latency and throughput under various traffic patterns and consumes less power and area.

[1]  William J. Dally,et al.  Flattened Butterfly Topology for On-Chip Networks , 2007, IEEE Comput. Archit. Lett..

[2]  Alain Greiner,et al.  A generic architecture for on-chip packet-switched interconnections , 2000, DATE '00.

[3]  Michel Quintard,et al.  X-ray micro-tomography and pore network modeling of single-phase fixed-bed reactors , 2014 .

[4]  Hannu Tenhunen,et al.  A study of 3D Network-on-Chip design for data parallel H.264 coding , 2009, 2009 NORCHIP.

[5]  Tobias Bjerregaard,et al.  A survey of research and practices of Network-on-chip , 2006, CSUR.

[6]  Hsin-Chou Chi,et al.  Area Utilization Based Mapping for Network-on-chip Architectures with Over-sized IP Cores , 2012, 2012 IEEE 14th International Conference on High Performance Computing and Communication & 2012 IEEE 9th International Conference on Embedded Software and Systems.

[7]  Ran Ginosar,et al.  Scalable network-on-chip architecture for configurable neural networks , 2011, Microprocess. Microsystems.

[8]  William J. Dally,et al.  Technology-Driven, Highly-Scalable Dragonfly Topology , 2008, 2008 International Symposium on Computer Architecture.

[9]  William J. Dally,et al.  Route packets, not wires: on-chip inteconnection networks , 2001, DAC '01.

[10]  Partha Pratim Pande,et al.  Design of a switch for network on chip applications , 2003, Proceedings of the 2003 International Symposium on Circuits and Systems, 2003. ISCAS '03..

[11]  William J. Dally,et al.  Design tradeoffs for tiled CMP on-chip networks , 2006, ICS '06.

[12]  Partha Pratim Pande,et al.  High-throughput switch-based interconnect for future SoCs , 2003, The 3rd IEEE International Workshop on System-on-Chip for Real-Time Applications, 2003. Proceedings..

[13]  Andrew B. Kahng,et al.  Explicit modeling of control and data for improved NoC router estimation , 2012, DAC Design Automation Conference 2012.

[14]  David Blaauw,et al.  Swizzle-Switch Networks for Many-Core Systems , 2012, IEEE Journal on Emerging and Selected Topics in Circuits and Systems.

[15]  Dimitrios Soudris,et al.  A framework for rapid evaluation of heterogeneous 3-D NoC architectures , 2014, Microprocess. Microsystems.

[16]  William J. Dally,et al.  Principles and Practices of Interconnection Networks , 2004 .

[17]  Nong Xiao,et al.  VBON: Toward efficient on-chip networks via hierarchical virtual bus , 2013, Microprocess. Microsystems.

[18]  Onur Mutlu,et al.  Express Cube Topologies for on-Chip Interconnects , 2009, 2009 IEEE 15th International Symposium on High Performance Computer Architecture.

[19]  Dov Harel,et al.  A linear time algorithm for the lowest common ancestors problem , 1980, 21st Annual Symposium on Foundations of Computer Science (sfcs 1980).

[20]  Chris Jackson,et al.  Skip the Analysis: Self-Optimising Networks-on-Chip (Invited Paper) , 2010, 2010 International Symposium on Electronic System Design.

[21]  Stephen Alstrup,et al.  Nearest Common Ancestors: A Survey and a New Algorithm for a Distributed Environment , 2004, Theory of Computing Systems.

[22]  Axel Jantsch,et al.  A network on chip architecture and design methodology , 2002, Proceedings IEEE Computer Society Annual Symposium on VLSI. New Paradigms for VLSI Systems Design. ISVLSI 2002.

[23]  W. Dally,et al.  Route packets, not wires: on-chip interconnection networks , 2001, Proceedings of the 38th Design Automation Conference (IEEE Cat. No.01CH37232).

[24]  Natalie D. Enright Jerger,et al.  Outstanding Research Problems in NoC Design: System, Microarchitecture, and Circuit Perspectives , 2009, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[25]  Dean M. Tullsen,et al.  Interconnections in multi-core architectures: understanding mechanisms, overheads and scaling , 2005, 32nd International Symposium on Computer Architecture (ISCA'05).

[26]  Kai Feng,et al.  A formal study on topology and floorplan characteristics of mesh and torus-based optical networks-on-chip , 2013, Microprocess. Microsystems.

[27]  David Wentzlaff,et al.  Processor: A 64-Core SoC with Mesh Interconnect , 2010 .

[28]  Dionisios N. Pnevmatikatos,et al.  VLSI micro-architectures for high-radix crossbar schedulers , 2011, Proceedings of the Fifth ACM/IEEE International Symposium.

[29]  Andrew Lines,et al.  NanoMesh: An Asynchronous Kilo-Core System-on-Chip , 2013, 2013 IEEE 19th International Symposium on Asynchronous Circuits and Systems.

[30]  David Blaauw,et al.  Scaling towards kilo-core processors with asymmetric high-radix topologies , 2013, 2013 IEEE 19th International Symposium on High Performance Computer Architecture (HPCA).

[31]  Eitan Zahavi Fat-tree routing and node ordering providing contention free traffic for MPI global collectives , 2012, J. Parallel Distributed Comput..