Statistical analysis of network traffic for adaptive faults detection

This paper addresses the problem of normal operation baselining for automatic detection of network anomalies. A model of network traffic is presented in which studied variables are viewed as sampled from a finite mixture model. Based on the stochastic approximation of the maximum likelihood function, we propose baselining network normal operation, using the asymptotic distribution of the difference between successive estimates of model parameters. The baseline random variable is shown to be stationary, with mean zero under normal operation. Anomalous events are shown to induce an abrupt jump in the mean. Detection is formulated as an online change point problem, where the task is to process the baseline random variable realizations, sequentially, and raise alarms as soon as anomalies occur. An analytical expression of false alarm rate allows us to choose the design threshold, automatically. Extensive experimental results on a real network showed that our monitoring agent is able to detect unusual changes in the characteristics of network traffic, adapt to diurnal traffic patterns, while maintaining a low alarm rate. Despite large fluctuations in network traffic, this work proves that tailoring traffic modeling to specific goals can be efficiently achieved.

[1]  Mischa Schwartz,et al.  Schemes for fault identification in communication networks , 1995, TNET.

[2]  Walter Willinger,et al.  Experimental queueing analysis with long-range dependent packet traffic , 1996, TNET.

[3]  Vladimir A. Bolotin,et al.  Telephone Circuit Holding Time Distributions , 1994 .

[4]  R. Redner,et al.  Mixture densities, maximum likelihood, and the EM algorithm , 1984 .

[5]  Vladimir A. Bolotin Modeling call holding time distributions for CCS network design and performance analysis , 1994, IEEE J. Sel. Areas Commun..

[6]  E. S. Page CONTINUOUS INSPECTION SCHEMES , 1954 .

[7]  Francisco Barceló,et al.  Channel holding time distribution in public telephony systems (PAMR and PCS) , 2000, IEEE Trans. Veh. Technol..

[8]  G. Lorden PROCEDURES FOR REACTING TO A CHANGE IN DISTRIBUTION , 1971 .

[9]  Vern Paxson,et al.  Empirically derived analytic models of wide-area TCP connections , 1994, TNET.

[10]  Walter Willinger,et al.  On the self-similar nature of Ethernet traffic , 1993, SIGCOMM '93.

[11]  D. Siegmund,et al.  Using the Generalized Likelihood Ratio Statistic for Sequential Detection of a Change-Point , 1995 .

[12]  Sally Floyd,et al.  Wide-area traffic: the failure of Poisson modeling , 1994 .

[13]  Tim Brailsford,et al.  Selecting the forgetting factor in subset autoregressive modelling , 2002 .

[14]  Nikos A. Vlassis,et al.  A kurtosis-based dynamic approach to Gaussian mixture modeling , 1999, IEEE Trans. Syst. Man Cybern. Part A.

[15]  Jingde Cheng,et al.  Detection of Network Faults and Performance Problems , 2001 .

[16]  Yechiam Yemini,et al.  Distributed management by delegation , 1995, Proceedings of 15th International Conference on Distributed Computing Systems.

[17]  Donald B. Rubin,et al.  Max-imum Likelihood from Incomplete Data , 1972 .

[18]  Jean-Pierre Hubaux,et al.  A Survey of Distributed Enterprise Network and Systems Management Paradigms , 1999, Journal of Network and Systems Management.

[19]  G. Jakobson,et al.  Alarm correlation , 1993, IEEE Network.

[20]  Chuanyi Ji,et al.  Proactive network fault detection , 1997, Proceedings of INFOCOM '97.

[21]  Ehud Weinstein,et al.  Sequential algorithms for parameter estimation based on the Kullback-Leibler information measure , 1990, IEEE Trans. Acoust. Speech Signal Process..

[22]  D. Ohsie,et al.  High speed and robust event correlation , 1996, IEEE Commun. Mag..

[23]  D. Titterington Recursive Parameter Estimation Using Incomplete Data , 1984 .

[24]  Marina Thottan,et al.  Statistical Detection of Enterprise Network Problems , 2004, Journal of Network and Systems Management.

[25]  Frank Feather,et al.  A case study of Ethernet anomalies in a distributed computing environment , 1990 .

[26]  Hassan Hajji Baselining network traffic and online faults detection , 2003, IEEE International Conference on Communications, 2003. ICC '03..

[27]  Michèle Basseville,et al.  Detection of abrupt changes: theory and application , 1993 .

[28]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[29]  Walter Willinger,et al.  On the Self-Similar Nature of Ethernet Traffic ( extended version ) , 1995 .

[30]  D. Siegmund Sequential Analysis: Tests and Confidence Intervals , 1985 .