A failure detection solution for multiple QoS in data center networks

Failures in data center networks sometimes can lead to user-perceived service interruptions. Automated failure detection is needed to maintain the reliability of data centers. However, researches rarely identify quality of service (QoS) multiplicity for failure detection in data center networks. In this paper, to tackle this problem, we first divide network devices into two categories: imperative devices whose failures need to be detected in realtime, and non-imperative ones. Consequently, we leverage a co-detection approach named K-detectors and a data mining based approach to detect failures of these two kinds of devices respectively. We evaluated our approach on a simulated network built by ns-3. The experimental results show that for servers, query accuracy probability improves 4.62% with detection time increasing slightly; for links, discrimination improves significantly (nearly 86%).

[1]  Robbert van Renesse,et al.  A Gossip-Style Failure Detection Service , 2009 .

[2]  Rachid Guerraoui,et al.  Failure detectors as first class objects , 1999, Proceedings of the International Symposium on Distributed Objects and Applications.

[3]  Navendu Jain,et al.  Understanding network failures in data centers: measurement, analysis, and implications , 2011, SIGCOMM.

[4]  Herodotos Herodotou,et al.  Scalable near real-time failure localization of data center networks , 2014, KDD.

[5]  Srikanth Kandula,et al.  Shrink: a tool for failure diagnosis in IP networks , 2005, MineNet '05.

[6]  Marcos K. Aguilera,et al.  On the quality of service of failure detectors , 2000, Proceeding International Conference on Dependable Systems and Networks. DSN 2000.