An Efficient Algorithm for Measuring Medium- to Large-Sized Flows in Network Traffic

It has been well recognized that identifying very large flows (i.e., elephants) in a network traffic stream is important for a variety of network applications ranging from traffic engineering to anomaly detection. However, we found that many of these applications have an increasing need to monitor not only the few largest flows (say top 20), but also all of the medium-sized flows (say top 20,000). Unfortunately, existing techniques for identifying elephant flows at high link speeds are not suitable and cannot be trivially extended for identifying the medium-sized flows. In this work, we propose a hybrid SRAM/DRAM algorithm for monitoring all elephant and medium-sized flows with strong accuracy guarantees. We employ a synopsis data structure (sketch) in SRAM to filter out small flows and preferentially sample medium and large flows to a flow table in DRAM. Our key contribution is to show how to maximize the use of SRAM and DRAM available to us by using a SRAM/DRAM hybrid data structure that can achieve more than an order of magnitude higher SRAM efficiency than previous methods. We design a quantization scheme that allows our algorithm to "read just enough" from the sketch at SRAM speed, without sacrificing much estimation accuracy. We provide analytical guarantees on the accuracy of the estimation and validate these by means of trace-driven evaluation using real- world packet traces..

[1]  Qi Zhao,et al.  Design of a novel statistics counter architecture with optimal space and time efficiency , 2006, SIGMETRICS '06/Performance '06.

[2]  kc claffy,et al.  Application of sampling methodologies to network traffic characterization , 1993, SIGCOMM 1993.

[3]  Fang Hao,et al.  ACCEL-RATE: a faster mechanism for memory efficient per-flow traffic estimation , 2004, SIGMETRICS '04/Performance '04.

[4]  Cristian Estan,et al.  New directions in traffic measurement and accounting , 2001, IMW '01.

[5]  Devavrat Shah,et al.  Maintaining Statistics Counters in Router Line Cards , 2002, IEEE Micro.

[6]  Carsten Lund,et al.  Charging from sampled network usage , 2001, IMW '01.

[7]  George Varghese,et al.  Efficient implementation of a statistics counter architecture , 2003, SIGMETRICS '03.

[8]  Yu Cheng,et al.  Accurate and Efficient Traffic Monitoring Using Adaptive Non-Linear Sampling Method , 2008, IEEE INFOCOM 2008 - The 27th Conference on Computer Communications.

[9]  George Varghese,et al.  Building a better NetFlow , 2004, SIGCOMM 2004.

[10]  Murali S. Kodialam,et al.  Runs based traffic estimator (RATE): a simple, memory efficient scheme for per-flow rate estimation , 2004, IEEE INFOCOM 2004.

[11]  Carsten Lund,et al.  Flow sampling under hard resource constraints , 2004, SIGMETRICS '04/Performance '04.

[12]  Abhishek Kumar,et al.  Sketch Guided Sampling - Using On-Line Estimates of Flow Size for Adaptive Data Collection , 2006, Proceedings IEEE INFOCOM 2006. 25TH IEEE International Conference on Computer Communications.

[13]  Yossi Matias,et al.  DIMACS Series in Discrete Mathematicsand Theoretical Computer Science Synopsis Data Structures for Massive Data , 2007 .

[14]  Moses Charikar,et al.  Finding frequent items in data streams , 2004, Theor. Comput. Sci..

[15]  Graham Cormode,et al.  An improved data stream summary: the count-min sketch and its applications , 2004, J. Algorithms.

[16]  Richard M. Karp,et al.  A simple algorithm for finding frequent elements in streams and bags , 2003, TODS.

[17]  Carsten Lund,et al.  Predicting resource usage and estimation accuracy in an IP flow measurement collection infrastructure , 2003, IMC '03.

[18]  Yossi Matias,et al.  Spectral bloom filters , 2003, SIGMOD '03.