Sketch-based change detection: methods, evaluation, and applications

Traffic anomalies such as failures and attacks are commonplace in today's network, and identifying them rapidly and accurately is critical for large network operators. The detection typically treats the traffic as a collection of flows that need to be examined for significant changes in traffic pattern (eg, volume, number of connections). However, as link speeds and the number of flows increase, keeping per-flow state is either too expensive or too slow. We propose building compact summaries of the traffic data using the notion of sketches. We have designed a variant of the sketch data structure, k-ary sketch, which uses a constant, small amount of memory, and has constant per-record update and reconstruction cost. Its linearity property enables us to summarize traffic at various levels. We then implement a variety of time series forecast models (ARIMA, Holt-Winters, etc.) on top of such summaries and detect significant changes by looking for flows with large forecast errors. We also present heuristics for automatically configuring the model parameters.Using a large amount of real Internet traffic data from an operational tier-1 ISP, we demonstrate that our sketch-based change detection method is highly accurate, and can be implemented at low computation and memory costs. Our preliminary results are promising and hint at the possibility of using our method as a building block for network anomaly detection and traffic measurement.

[1]  P. Young,et al.  Time series analysis, forecasting and control , 1972, IEEE Transactions on Automatic Control.

[2]  S. Addelman Statistics for experimenters , 1978 .

[3]  Thomas S. Huang,et al.  A fast two-dimensional median filtering algorithm , 1979 .

[4]  Larry Carter,et al.  Universal Classes of Hash Functions , 1979, J. Comput. Syst. Sci..

[5]  Larry Carter,et al.  New Hash Functions and Their Use in Authentication and Set Equality , 1981, J. Comput. Syst. Sci..

[6]  R. Tsay Time Series Model Specification in the Presence of Outliers , 1986 .

[7]  R. Tsay Outliers, Level Shifts, and Variance Changes in Time Series , 1988 .

[8]  Lon-Mu Liu,et al.  Joint Estimation of Model Parameters and Outlier Effects in Time Series , 1993 .

[9]  Lon-Mu Liu,et al.  Forecasting time series with outliers , 1993 .

[10]  Frank Feather,et al.  Fault detection in an Ethernet network using anomaly signature matching , 1993, SIGCOMM '93.

[11]  Mischa Schwartz,et al.  Schemes for fault identification in communication networks , 1995, TNET.

[12]  Noga Alon,et al.  The space complexity of approximating the frequency moments , 1996, STOC '96.

[13]  Richard A. Davis,et al.  Introduction to time series and forecasting , 1998 .

[14]  Chuanyi Ji,et al.  Proactive network fault detection , 1997, Proceedings of INFOCOM '97.

[15]  Peter W. Glynn,et al.  Internet service performance failure detection , 1998, PERV.

[16]  N. Devillard,et al.  Fast Median Search: An ANSI C Implementation , 1998 .

[17]  Vern Paxson,et al.  Bro: a system for detecting network intruders in real-time , 1998, Comput. Networks.

[18]  Martin Roesch,et al.  Snort - Lightweight Intrusion Detection for Networks , 1999 .

[19]  Oliver Niggemann,et al.  Supporting Intrusion Detection by Graph Clustering and Graph Drawing , 2000 .

[20]  Jake D. Brutlag,et al.  Aberrant Behavior Detection in Time Series for Network Monitoring , 2000, LISA.

[21]  Nong Ye,et al.  A Markov Chain Model of Temporal Behavior for Anomaly Detection , 2000 .

[22]  Mark Handley,et al.  Equation-based congestion control for unicast applications , 2000, SIGCOMM.

[23]  Anna C. Gilbert,et al.  QuickSAND: Quick Summary and Analysis of Network Data , 2001 .

[24]  Paul Barford,et al.  Characteristics of network traffic flow anomalies , 2001, IMW '01.

[25]  Kevin J. Houle,et al.  Trends in Denial of Service Attack Technology , 2001 .

[26]  Matthew J. Lebo,et al.  Foreign Policy Behavior and Fractional Integration , 2002 .

[27]  George Varghese,et al.  New directions in traffic measurement and accounting , 2002, CCRV.

[28]  Balachander Krishnamurthy,et al.  Flash crowds and denial of service attacks: characterization and implications for CDNs and web sites , 2002, WWW.

[29]  Moses Charikar,et al.  Finding frequent items in data streams , 2002, Theor. Comput. Sci..

[30]  S. Muthukrishnan,et al.  Estimating Rarity and Similarity over Data Stream Windows , 2002, ESA.

[31]  Paul Barford,et al.  A signal analysis of network traffic anomalies , 2002, IMW '02.

[32]  S. Muthukrishnan,et al.  Data streams: algorithms and applications , 2005, SODA '03.

[33]  Graham Cormode,et al.  What's hot and what's not: tracking most frequent items dynamically , 2003, PODS '03.

[34]  Stefan Savage,et al.  The Spread of the Sapphire/Slammer Worm , 2003 .

[35]  Moses Charikar,et al.  Finding frequent items in data streams , 2004, Theor. Comput. Sci..

[36]  Mikkel Thorup,et al.  Tabulation based 4-universal hashing with applications to second moment estimation , 2004, SODA '04.

[37]  Data Streams : Algorithms and Applications 2 , 2022 .