Prevention flow-control for low latency torus networks-on-chip

The challenge for on-chip networks is to provide low latency communication in a very low power budget. To reduce the latency and keep the simplicity of a mesh network, torus network is proposed. As torus networks have inherent circular dependency, additional effort is needed to prevent deadlock, even if deadlock free routing algorithms are used. We describe a novel flow-control mechanism to address cost/performance constraints in torus networks and ensure freedom from deadlock. Flow-control is achieved using a prevention mechanism which uses virtual cut-through switching, and deadlock freedom is achieved by considering only a single packet buffer per input port. We can simplify the router design by having a simple switch allocator, which prioritizes insight packets, and a single packet buffer per input port, which eliminates the need for virtual channels. Experimental validation reveals that our design achieves significant improvement in throughput, as compared to the traditional design, using significantly fewer buffers.

[1]  William J. Dally,et al.  Design tradeoffs for tiled CMP on-chip networks , 2006, ICS '06.

[2]  Sharad Malik,et al.  Power-driven design of router microarchitectures in on-chip networks , 2003, Proceedings. 36th Annual IEEE/ACM International Symposium on Microarchitecture, 2003. MICRO-36..

[3]  José Duato,et al.  A Necessary and Sufficient Condition for Deadlock-Free Routing in Cut-Through and Store-and-Forward Networks , 1996, IEEE Trans. Parallel Distributed Syst..

[4]  W. Dally,et al.  Route packets, not wires: on-chip interconnection networks , 2001, Proceedings of the 38th Design Automation Conference (IEEE Cat. No.01CH37232).

[5]  Sriram R. Vangal,et al.  A 5-GHz Mesh Interconnect for a Teraflops Processor , 2007, IEEE Micro.

[6]  M. Erez,et al.  Express Virtual Channels with Capacitively Driven Global Links , 2009, IEEE Micro.

[7]  William J. Dally,et al.  Deadlock-Free Message Routing in Multiprocessor Interconnection Networks , 1987, IEEE Transactions on Computers.

[8]  Antonio Robles,et al.  A Comparison of Router Architectures for Virtual Cut-Through and Wormhole Switching in a NOW Environment , 2001, J. Parallel Distributed Comput..

[9]  Sheldon B. Akers,et al.  A Group-Theoretic Model for Symmetric Interconnection Networks , 1989, IEEE Trans. Computers.

[10]  Yu Cao,et al.  Predictive Technology Model for Nano-CMOS Design Exploration , 2006, 2006 1st International Conference on Nano-Networks and Workshops.

[11]  Simon W. Moore,et al.  Low-latency virtual-channel routers for on-chip networks , 2004, Proceedings. 31st Annual International Symposium on Computer Architecture, 2004..

[12]  Steven L. Scott,et al.  The Cray T3E Network: Adaptive Routing in a High Performance 3D Torus , 1996 .

[13]  Carmen Carrión,et al.  A flow control mechanism to avoid message deadlock in k-ary n-cube networks , 1997, Proceedings Fourth International Conference on High-Performance Computing.

[14]  A. Kumary,et al.  A 4.6Tbits/s 3.6GHz single-cycle NoC router with a novel switch allocator in 65nm CMOS , 2007 .

[15]  Sudhakar Yalamanchili,et al.  Interconnection Networks: An Engineering Approach , 2002 .

[16]  José Duato,et al.  Adaptive bubble router: a design to improve performance in torus networks , 1999, Proceedings of the 1999 International Conference on Parallel Processing.

[17]  Niraj K. Jha,et al.  A 4.6Tbits/s 3.6GHz single-cycle NoC router with a novel switch allocator in 65nm CMOS , 2007, ICCD.

[18]  William J. Dally,et al.  Flattened Butterfly Topology for On-Chip Networks , 2007, IEEE Comput. Archit. Lett..

[19]  William J. Dally,et al.  The torus routing chip , 2005, Distributed Computing.

[20]  John Kim,et al.  Low-cost router microarchitecture for on-chip networks , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[21]  Leonard Kleinrock,et al.  Virtual Cut-Through: A New Computer Communication Switching Technique , 1979, Comput. Networks.

[22]  Andrew B. Kahng,et al.  ORION 2.0: A fast and accurate NoC power and area model for early-stage design space exploration , 2009, 2009 Design, Automation & Test in Europe Conference & Exhibition.

[23]  William J. Dally,et al.  Principles and Practices of Interconnection Networks , 2004 .