A Mathematical Framework for the Detection of Elephant Flows

How large is a network flow? Traditionally this question has been addressed by using metrics such as the number of bytes, the transmission rate or the duration of a flow. We reason that a formal mathematical definition of flow size should account for the impact a flow has on the performance of a network: flows that have the largest impact, should have the largest size. In this paper we present a theory of flow ordering that reveals the connection between the abstract concept of flow size and the QoS properties of a network. The theory is generalized to accommodate for the case of partial information, allowing us to model real computer network scenarios such as those found in involuntary lossy environments or voluntary packet sampling protocols (e.g., sFlow). We explore one application of this theory to address the problem of elephant flow detection at very high speed rates. The algorithm uses the information theoretic properties of the problem to help reduce the computational cost by a factor of one thousand.

[1]  Sandia Report,et al.  Formulas for Robust, One-Pass Parallel Computation of Covariances and Arbitrary-Order Statistical Moments , 2008 .

[2]  Walter Willinger,et al.  Self-similarity through high-variability: statistical analysis of Ethernet LAN traffic at the source level , 1997, TNET.

[3]  Peter Phaal,et al.  InMon Corporation's sFlow: A Method for Monitoring Traffic in Switched and Routed Networks , 2001, RFC.

[4]  Richard G. Baraniuk,et al.  Connection-level analysis and modeling of network traffic , 2001, IMW '01.

[5]  Aiko Pras,et al.  A Statistical Analysis of Network Parameters for the Self-management of Lambda-Connections , 2009, AIMS.

[6]  Dan Gusfield,et al.  Partition-distance: A problem and class of perfect graphs arising in clustering , 2002, Inf. Process. Lett..

[7]  Yongzheng Zhang,et al.  Identifying high-rate flows based on Bayesian single sampling , 2010, 2010 2nd International Conference on Computer Engineering and Technology.

[8]  Richard A. Lethin,et al.  High-performance many-core networking: design and implementation , 2015, INDIS '15.

[9]  Walter Willinger,et al.  Self-similarity and heavy tails: structural modeling of network traffic , 1998 .

[10]  Shigeki Goto,et al.  Identifying elephant flows through periodically sampled packets , 2004, IMC '04.

[11]  R. K. Shyamasundar,et al.  Introduction to algorithms , 1996 .

[12]  Malathi Veeraraghavan,et al.  On How to Provision Quality of Service (QoS) for Large Dataset Transfers , 2013 .

[13]  Konstantinos Psounis,et al.  SIFT : A simple algorithm for tracking elephant flows , and taking advantage of power laws , 2005 .

[14]  Yi Lu,et al.  ElephantTrap: A low cost device for identifying large flows , 2007, 15th Annual IEEE Symposium on High-Performance Interconnects (HOTI 2007).

[15]  Kun-Chan Lan,et al.  A measurement study of correlations of Internet flow characteristics , 2006, Comput. Networks.

[16]  George Varghese,et al.  Building a better NetFlow , 2004, SIGCOMM 2004.