A CAM-Free Exascalable HPC Router for Low-Energy Communications

Power consumption is the main hurdle in the race for designing Exascale-capable computing systems which would require deploying millions of computing elements. While this problem is being addressed by designing increasingly more power-efficient processing subsystems, little effort has been put on reducing the power consumption of the interconnection network. This is precisely the objective of this work, in which we study the benefits, in terms of both area and power, of avoiding costly and power-hungry CAM-based routing tables deep-rooted in all current networking technologies. We present our custom-made, FPGA-based router based on a simple, arithmetic routing engine which is shown to be much more power- and area-efficient than even a relatively small 2K-entry routing table which requires as much area and one order of magnitude more power than our router.

[1]  William J. Dally,et al.  Principles and Practices of Interconnection Networks , 2004 .

[2]  Amith R. Mamidala,et al.  Looking under the hood of the IBM Blue Gene/Q network , 2012, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis.

[3]  Feroz Zahid,et al.  A Weighted Fat-Tree Routing Algorithm for Efficient Load-Balancing in Infini Band Enterprise Clusters , 2015, 2015 23rd Euromicro International Conference on Parallel, Distributed, and Network-Based Processing.

[4]  Jean-Pierre Panziera,et al.  The BXI Interconnect Architecture , 2015, 2015 IEEE 23rd Annual Symposium on High-Performance Interconnects.

[5]  Andrew W. Moore,et al.  NetFPGA SUME: Toward 100 Gbps as Research Commodity , 2014, IEEE Micro.

[6]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[7]  Tomohiro Inoue,et al.  The Tofu Interconnect , 2012, IEEE Micro.

[8]  Javier Navaridas,et al.  Simulating and evaluating interconnection networks with INSEE , 2011, Simul. Model. Pract. Theory.

[9]  Pedro López,et al.  Deterministic versus Adaptive Routing in Fat-Trees , 2007, 2007 IEEE International Parallel and Distributed Processing Symposium.

[10]  Sudhakar Yalamanchili,et al.  Interconnection Networks: An Engineering Approach , 2002 .

[11]  Y. Zhang,et al.  The ExaNeSt Project: Interconnects, Storage, and Packaging for Exascale Systems , 2016, 2016 Euromicro Conference on Digital System Design (DSD).

[12]  Hong Liu,et al.  Jupiter Rising: A Decade of Clos Topologies and Centralized Control in Google's Datacenter Network , 2015, Comput. Commun. Rev..

[13]  Sujata Banerjee,et al.  ElasticTree: Saving Energy in Data Center Networks , 2010, NSDI.

[14]  Alfredo Cuzzocrea,et al.  Big Graph Analytics: The State of the Art and Future Research Agenda , 2014, DOLAP '14.

[15]  William J. Dally,et al.  Technology-Driven, Highly-Scalable Dragonfly Topology , 2008, 2008 International Symposium on Computer Architecture.

[16]  Hong Liu,et al.  Energy proportional datacenter networks , 2010, ISCA.

[17]  Dharma P. Agrawal,et al.  Generalized Hypercube and Hyperbus Structures for a Computer Network , 1984, IEEE Transactions on Computers.

[18]  Khanh-Van Nguyen,et al.  An Interconnection Network Exploiting Trade-Off between Routing Table Size and Path Length , 2016, 2016 Fourth International Symposium on Computing and Networking (CANDAR).

[19]  Fabrizio Petrini,et al.  k-ary n-trees: high performance networks for massively parallel architectures , 1997, Proceedings 11th International Parallel Processing Symposium.

[20]  Jean-Noël Quintin,et al.  The BXI routing architecture for exascale supercomputer , 2016, The Journal of Supercomputing.

[21]  Amin Vahdat,et al.  A scalable, commodity data center network architecture , 2008, SIGCOMM '08.

[22]  Antonio Robles,et al.  Effective methodology for deadlock-free minimal routing in InfiniBand networks , 2002, Proceedings International Conference on Parallel Processing.

[23]  Luiz Marcos Garcia Gonçalves,et al.  Towards green data centers: A comparison of x86 and ARM architectures power efficiency , 2012, J. Parallel Distributed Comput..

[24]  Eitan Zahavi Fat-tree routing and node ordering providing contention free traffic for MPI global collectives , 2012, J. Parallel Distributed Comput..