The Network Link Outlier Factor (NLOF) for Fault Localization

We describe and experimentally evaluate the performance of our Network Link Outlier Factor (NLOF) for locating faults in communication networks. The NLOF is a unique outlier score assigned to each link in a network. It is computed using four distinct stages in a data analytics pipeline. The input to the pipeline are flow records (e.g., NetFlow) and network topology data (e.g., Link Layer Discovery Protocol (LLDP)). In the first stage, flow record throughput values are clustered in two sub-stages: using Density-Based Spatial Clustering of Applications with Noise (DBSCAN) and then our novel domain-specific ThroughPut Cluster (TPCluster) technique. In the second stage, flow outlier scores are determined within each cluster using a measure of proximity to a selected performance exemplar. In the third stage, flows are associated with network links using topology data. Finally, in the fourth stage the flow outliers are used to compute the outlier factor or score for each network link. The network link outlier scores are used with a detection rule to locate faults. We present the results of a wide set of Mininet experiments that appraise the fault detection/localization performance of NLOF. We find that NLOF allows for the detection of errors on edge links with a simple detection rule and the detection of errors on core links with a rule that includes topology relationships. NLOF is also compared to an abrupt change detection technique; while both have roughly the same detection power, the precision of NLOF is 42% higher and NLOF required 40% less time to detect failures on average.

[1]  Chuanyi Ji,et al.  Proactive network fault detection , 1997, Proceedings of INFOCOM '97.

[2]  Ehab Al-Shaer,et al.  Active integrated fault localization in communication networks , 2005, 2005 9th IFIP/IEEE International Symposium on Integrated Network Management, 2005. IM 2005..

[3]  Mohammad Sadeq Garshasbi Fault localization based on combines active and passive measurements in computer networks by ant colony optimization , 2016, Reliab. Eng. Syst. Saf..

[4]  Maitreya Natu,et al.  Efficient probe selection algorithms for fault diagnosis , 2008, Telecommun. Syst..

[5]  Michael P. McGarry,et al.  The network link outlier factor (NLOF) , 2020, Defense + Commercial Sensing.

[6]  Hassan Hajji,et al.  Statistical analysis of network traffic for adaptive faults detection , 2005, IEEE Transactions on Neural Networks.

[7]  Frank Feather,et al.  A case study of Ethernet anomalies in a distributed computing environment , 1990 .

[8]  Salah Zidi,et al.  Fault Detection in Wireless Sensor Networks Through SVM Classifier , 2018, IEEE Sensors Journal.

[9]  Ouajdi Korbaa,et al.  Fault Localization Algorithm in Computer Networks Based on the Boolean Particle Swarm Optimization , 2019, 2019 IEEE International Conference on Systems, Man and Cybernetics (SMC).

[10]  Armando Fox,et al.  Detecting application-level failures in component-based Internet services , 2005, IEEE Transactions on Neural Networks.

[11]  Malgorzata Steinder,et al.  Yemanja—A Layered Fault Localization System for Multi-Domain Computing Utilities , 2002, Journal of Network and Systems Management.

[12]  Ryu Miura,et al.  Adaptive boolean network tomography for link failure detection , 2015, 2015 IFIP/IEEE International Symposium on Integrated Network Management (IM).

[13]  Michèle Basseville,et al.  Detection of abrupt changes: theory and application , 1993 .

[14]  S. J. B. Yoo,et al.  Soft failure localization during commissioning testing and lightpath operation , 2018, IEEE/OSA Journal of Optical Communications and Networking.

[15]  Raouf Boutaba,et al.  Efficient Active Probing for Fault Diagnosis in Large Scale and Noisy Networks , 2010, 2010 Proceedings IEEE INFOCOM.

[16]  Malgorzata Steinder,et al.  A survey of fault localization techniques in computer networks , 2004, Sci. Comput. Program..

[17]  Ehab Al-Shaer,et al.  Efficient fault diagnosis using incremental alarm correlation and active investigation for internet and overlay networks , 2008, IEEE Transactions on Network and Service Management.

[18]  Isabelle Rouvellou,et al.  Automatic alarm correlation for fault identification , 1995, Proceedings of INFOCOM'95.

[19]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[20]  Francesco Musumeci,et al.  Machine-Learning-Based Soft-Failure Detection and Identification in Optical Networks , 2018, 2018 Optical Fiber Communications Conference and Exposition (OFC).

[21]  Maitreya Natu,et al.  Probe Station Placement for Fault Diagnosis , 2007, IEEE GLOBECOM 2007 - IEEE Global Telecommunications Conference.

[22]  Piero Castoldi,et al.  BER Degradation Detection and Failure Identification in Elastic Optical Networks , 2017, Journal of Lightwave Technology.

[23]  Marina Thottan,et al.  Adaptive thresholding for proactive network problem detection , 1998, Proceedings of the IEEE Third International Workshop on Systems Management.

[24]  Michael P. McGarry,et al.  Detecting Network Soft-failures with the Network Link Outlier Factor (NLOF) , 2019, 2019 IEEE International Conference on Big Data (Big Data).

[25]  Athanasios V. Vasilakos,et al.  D2FL: Design and Implementation of Distributed Dynamic Fault Localization , 2016, IEEE Trans. Dependable Secur. Comput..

[26]  Frank Feather,et al.  Fault detection in an Ethernet network using anomaly signature matching , 1993, SIGCOMM '93.

[27]  Binh Nguyen,et al.  ABSENCE: Usage-based Failure Detection in Mobile Networks , 2015, MobiCom.

[28]  Eleftheria Athanasopoulou,et al.  Probabilistic approaches to fault detection in networked discrete event systems , 2005, IEEE Transactions on Neural Networks.

[29]  Rajdeep Das,et al.  Understanding the Limits of Passive Realtime Datacenter Fault Detection and Localization , 2019, IEEE/ACM Transactions on Networking.

[30]  Adarshpal S. Sethi,et al.  Recent Advances in Fault Localization in Computer Networks , 2016, IEEE Communications Surveys & Tutorials.

[31]  Seraphin B. Calo,et al.  Alarm correlation and fault identification in communication networks , 1994, IEEE Trans. Commun..

[32]  Marina Thottan,et al.  Proactive anomaly detection using distributed intelligent agents , 1998, IEEE Netw..

[33]  Symeon Papavassiliou,et al.  Network intrusion and fault detection: a statistical anomaly approach , 2002, IEEE Commun. Mag..

[34]  George W. Hart,et al.  Correcting dependent errors in sequences generated by finite-state processes , 1993, IEEE Trans. Inf. Theory.

[35]  Tram Truong-Huu,et al.  Machine Learning-Based Link Fault Identification and Localization in Complex Networks , 2018, IEEE Internet of Things Journal.

[36]  Shahram Jamali,et al.  Fault localization algorithm in computer networks by employing a genetic algorithm , 2017, J. Exp. Theor. Artif. Intell..

[37]  Maggie Xiaoyan Cheng,et al.  Data Analytics for Fault Localization in Complex Networks , 2016, IEEE Internet of Things Journal.

[38]  Robert Harper,et al.  Improved Fault Localization using Transfer Learning and Language Modeling , 2020, NOMS 2020 - 2020 IEEE/IFIP Network Operations and Management Symposium.