Partition-Aware Routing to Improve Network Isolation in Infiniband Based Multi-tenant Clusters

InfiniBand (IB) is a widely used network interconnect for modern high-performance computing systems. In large IB fabrics, isolation of nodes is provided through partitioning. The routing algorithm, however, is unaware of these partitions in the network, Traffic flows belonging to different partitions might share links inside the network fabric. This sharing of intermediate links creates interference, which is particularly critical to avoid in multi-tenant environments like a cloud. In such systems, each tenant should experience predictable network performance, unaffected by the workload of other tenants. In addition, using current routing schemes, routes crossing partition boundaries are considered when distributing routes onto links in the network, despite the fact that these routes will never be used. The result is degraded load-balancing. In this paper, we present a novel partition-aware fat-tree routing algorithm, pFTree. The pFTree algorithm utilizes several mechanisms to provide network-wide isolation of partitions belonging to different tenant groups. Given the available network resources, pFTree starts by isolating partitions at the physical link level, and then moves on to utilize virtual lanes, if needed. Our experiments and simulations show that pFTree is able to significantly reduce the affect of inter-partition interference without any additional functional overhead. Furthermore, pFTree also provides improved load-balancing over the de facto standard IB fat-tree routing algorithm.

[1]  J. Flich,et al.  Routing in InfiniBand TM Torus Network Topologies , .

[2]  Joan Jacobs,et al.  D-Mod-K Routing Providing Non-Blocking Traffic for Shift Permutations on Real Life Fat Trees , 2010 .

[3]  José Duato,et al.  QoS in InfiniBand subnetworks , 2004, IEEE Transactions on Parallel and Distributed Systems.

[4]  Charles E. Leiserson,et al.  Fat-trees: Universal networks for hardware-efficient supercomputing , 1985, IEEE Transactions on Computers.

[5]  Fabrizio Petrini,et al.  k-ary n-trees: high performance networks for massively parallel architectures , 1997, Proceedings 11th International Parallel Processing Symposium.

[6]  Olav Lysne,et al.  Layered shortest path (LASH) routing in irregular system area networks , 2002, Proceedings 16th International Parallel and Distributed Processing Symposium.

[7]  José Duato,et al.  Routing in InfiniBandTM Torus Network Topologie. , 2003 .

[8]  Albert G. Greenberg,et al.  EyeQ: Practical Network Performance Isolation at the Edge , 2013, NSDI.

[9]  Torsten Hoefler,et al.  Multistage switches are not crossbars: Effects of static routing in high-performance networks , 2008, 2008 IEEE International Conference on Cluster Computing.

[10]  Frank Bellosa,et al.  Virtual InfiniBand clusters for HPC clouds , 2012, CloudCP '12.

[11]  Torsten Hoefler,et al.  Deadlock-Free Oblivious Routing for Arbitrary Topologies , 2011, 2011 IEEE International Parallel & Distributed Processing Symposium.

[12]  Piotr Luszczek,et al.  Design and Implementation of the HPC Challenge Benchmark Suite , 2011 .

[13]  Torsten Hoefler,et al.  ORCS : An Oblivious Routing Congestion Simulator , 2009 .

[14]  Abdallah Khreishah,et al.  Building a Private HPC Cloud for Compute and Data-Intensive Applications , 2013, CloudCom 2013.

[15]  Olav Lysne,et al.  vFtree - A Fat-Tree Routing Algorithm Using Virtual Lanes to Alleviate Congestion , 2011, 2011 IEEE International Parallel & Distributed Processing Symposium.

[16]  Antonio Robles,et al.  Routing in InfiniBand Torus Network Topologies , 2003 .

[17]  Feroz Zahid,et al.  A Weighted Fat-Tree Routing Algorithm for Efficient Load-Balancing in Infini Band Enterprise Clusters , 2015, 2015 23rd Euromicro International Conference on Parallel, Distributed, and Network-Based Processing.

[18]  José Duato,et al.  On the Infiniband subnet discovery process , 2003, 2003 Proceedings IEEE International Conference on Cluster Computing.

[19]  Torsten Hoefler,et al.  Netgauge: A Network Performance Measurement Framework , 2007, HPCC.

[20]  Darren J. Kerbyson,et al.  Optimized InfiniBand TM fat-tree routing for shift all-to-all communication patterns , 2010, ISC 2010.

[21]  Wei Huang,et al.  Design of High Performance MVAPICH2: MPI2 over InfiniBand , 2006, Sixth IEEE International Symposium on Cluster Computing and the Grid (CCGRID'06).

[22]  Mohan Kumar,et al.  On generalized fat trees , 1995, Proceedings of 9th International Parallel Processing Symposium.

[23]  Albert G. Greenberg,et al.  Seawall: Performance Isolation for Cloud Datacenter Networks , 2010, HotCloud.

[24]  Gail-Joon Ahn,et al.  Security and Privacy Challenges in Cloud Computing Environments , 2010, IEEE Security & Privacy.

[25]  Olav Lysne,et al.  dFtree: a fat-tree routing algorithm using dynamic allocation of virtual lanes to alleviate congestion in infiniband networks , 2011, NDM '11.

[26]  Michael Lang,et al.  Optimized InfiniBandTM fat‐tree routing for shift all‐to‐all communication patterns , 2010, Concurr. Comput. Pract. Exp..