Peripheral Diagnosis for Propagated Network Faults

Failures are unavoidable in communication networks, so their detection and identification are vital for the reliable operation of the networks. The existing fault diagnosis techniques are based on many paradigms derived from different areas (e.g., mathematical theories, machine learning, statistical analysis) and with different purposes, such as, obtaining a representation model of the network for fault localization, selecting optimal probe sets for monitoring network devices, reducing fault detection time, and detection of faulty components in the network. Nevertheless, there are still challenges to be faced because those techniques are invasive on account of they increase network traffic and the control overhead. Also, they intensify the internal processes of the network through expanding management processes or monitoring agents on almost all networking devices. This paper introduces a non-invasive fault detection approach based on the observation of symptoms of internal network failures in gateway routers (called peripheral elements). We developed a link failure induction experiment in an emulated network that evidenced the existence of the fault propagation phenomenon to a peripheral level, which demonstrates the feasibility of our approach. Our results foster the use of learning techniques which do not require a complete dependency model of the network and could continuously diagnose the failure symptoms while being resilient to the dynamic changes of the network.

[1]  Youxian Sun,et al.  A new fault detection method for computer networks , 2013, Reliab. Eng. Syst. Saf..

[2]  Tram Truong-Huu,et al.  Machine Learning-Based Link Fault Identification and Localization in Complex Networks , 2018, IEEE Internet of Things Journal.

[3]  Raouf Boutaba,et al.  A comprehensive survey on machine learning for networking: evolution, applications and research opportunities , 2018, Journal of Internet Services and Applications.

[4]  Rebecca Steinert,et al.  Toward decentralized probabilistic management , 2011, IEEE Communications Magazine.

[5]  Shi Ying,et al.  Log-Based Anomaly Detection with the Improved K-Nearest Neighbor , 2020, Int. J. Softw. Eng. Knowl. Eng..

[6]  Manish Kumar,et al.  DLME: Distributed Log Mining Using Ensemble Learning for Fault Prediction , 2019, IEEE Systems Journal.

[7]  Ehab Al-Shaer,et al.  Problem Localization and Quantification Using Formal Evidential Reasoning for Virtual Networks , 2014, IEEE Transactions on Network and Service Management.

[8]  Raouf Boutaba,et al.  Machine Learning for Cognitive Network Management , 2018, IEEE Communications Magazine.