Flow classification by histograms: or how to go on safari in the internet

In order to control and manage highly aggregated Internet traffic flows efficiently, we need to be able to categorize flows into distinct classes and to be knowledgeable about the different behavior of flows belonging to these classes. In this paper we consider the problem of classifying BGP level prefix flows into a small set of homogeneous classes. We argue that using the entire distributional properties of flows can have significant benefits in terms of quality in the derived classification. We propose a method based on modeling flow histograms using Dirichlet Mixture Processes for random distributions. We present an inference procedure based on the Simulated Annealing Expectation Maximization algorithm that estimates all the model parameters as well as flow membership probabilities - the probability that a flow belongs to any given class. One of our key contributions is a new method for Internet flow classification. We show that our method is powerful in that it is capable of examining macroscopic flows while simultaneously making fine distinctions between different traffic classes. We demonstrate that our scheme can address issues with flows being close to class boundaries and the inherent dynamic behaviour of Internet flows.

[1]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[2]  Christian P. Robert,et al.  The Bayesian choice , 1994 .

[3]  M. Escobar Estimating Normal Means with a Dirichlet Process Prior , 1994 .

[4]  Gilles Celeux,et al.  On Stochastic Versions of the EM Algorithm , 1995 .

[5]  M. Escobar,et al.  Bayesian Density Estimation and Inference Using Mixtures , 1995 .

[6]  P. Müller,et al.  Bayesian curve fitting using multivariate normal mixtures , 1996 .

[7]  Larry Peterson,et al.  Inter-AS traffic patterns and their implications , 1999, Seamless Interconnection for Universal Services. Global Telecommunications Conference. GLOBECOM'99. (Cat. No.99CH37042).

[8]  Anees Shaikh,et al.  Load-sensitive routing of long-lived IP flows , 1999, SIGCOMM '99.

[9]  Olivier Bonaventure,et al.  On the Cost of Using MPLS for Interdomain Traffic , 2000, QofIS.

[10]  Mark Crovella Performance Evaluation with Heavy Tailed Distributions , 2000, Computer Performance Evaluation / TOOLS.

[11]  Kavé Salamatian,et al.  Hidden Markov modeling for network communication channels , 2001, SIGMETRICS '01.

[12]  Richard G. Baraniuk,et al.  Connection-level analysis and modeling of network traffic , 2001, IMW '01.

[13]  George Varghese,et al.  New directions in traffic measurement and accounting , 2002, SIGCOMM '02.

[14]  kc claffy,et al.  Understanding Internet traffic streams: dragonflies and tortoises , 2002, IEEE Commun. Mag..

[15]  R. Emilion Classification et mélanges de processus , 2002 .

[16]  Konstantina Papagiannaki,et al.  Impact of flow dynamics on traffic engineering design principles , 2004, IEEE INFOCOM 2004.

[17]  Konstantina Papagiannaki,et al.  Structural analysis of network traffic flows , 2004, SIGMETRICS '04/Performance '04.