MTR: Fault tolerant routing in Clos data center network with miswiring links

The data center network (DCN) is a key component of cloud computing. With the rapid expansion of cloud computing, the scale of DCN grows bigger and bigger. However, lacking proper engineering management method, engineers may miswire some links while building DCN, which is called “miswiring problem”. And these miswiring links lead to differences between physical topology and design blueprint graph of DCN, resulting in communication error in DCN. The previous works (DAC [1] and ETAC [2]) only detect devices with miswiring links. DAC can not let the network work until engineers fix miswiring links manually, which is a time-consuming and error-prone task. And ETAC only utilize the devices without miswiring links, it excludes devices with miswiring links from working, which wastes link resource and drops down network throughput. In this paper, we focus on miswiring problem in Clos-based DCN network, and an effective algorithm is introduced to detect and correct miswiring links. Moreover, we propose a miswiring tolerant routing protocol (MTR) to embrace miswiring links, increasing the network throughput in the presence of miswiring links. The simulation results show that for a Fat-Tree network with 128,000 servers, our design can efficiently detect and correct miswiring links (at most 20% miswiring links) in less than 120 milliseconds. And in a 32-array Fat-Tree network, compared with ECMP, MTR can reduce the data transmission completion time by 2.5%, 5.43%, 8.74%, and 11.66% when the percentage of miswiring links is 5%, 10%, 15%, and 20%, respectively.