Kuijia: Traffic rescaling in data center WANs

Network faults like link or switch failures can cause heavy congestion and packet loss. Traffic engineering systems need a lot of time to detect and react to such faults, which results in significant recovery times. Recent work either pre-installs a lot of backup paths in the switches to ensure fast reroute, or proactively pre-reserve bandwidth to achieve fault-resiliency. Our idea agilely reacts to failures in data plane while eliminating pre-installation of backup paths. We propose Kuijia, a robust traffic engineering system for data center WANs which relies on a novel failover mechanism in data plane called rate rescaling. The affected flows on failed tunnels are rescaled to the remaining tunnels, and enter low priority queues to avoid performance impairment of abnormal flows on remaining tunnels. Real system experiments show that Kuijia is effective in handling network faults and significantly outperforms conventional rescaling method.