Efficient phase-type fitting with aggregated traffic traces

Approximating the empirical distribution of a measured data trace by a phase-type (PH) distribution has significant applications in the analysis of stochastic models. For phase-type fitting, a large number of different methods and tools exist. One drawback of all these methods is that the fitting effort strongly depends on the size of the data trace to be fitted. Since large data traces are necessary to capture rare events, which have a strong impact on system performance, current fitting procedures are very time consuming. In this paper, we introduce a method to generate an aggregated trace from the original trace, and we show how to effectively use the aggregated trace within a PH fitting approach, called G-FIT. In particular, we show that elements of a large traffic trace can be aggregated to a smaller number of 50-200 weighted elements, while fitting accuracy remains the same compared to the case of fitting the original trace. As a result, CPU time requirements for fitting PH distributions can be decreased by about four orders of magnitude, such that traces with ten million elements can be accurately fitted in a few seconds. The effectiveness of the proposed method is demonstrated on a set of benchmark traces and two real traffic traces as well as quantitative results from queuing analysis.

[1]  Sally Floyd,et al.  Wide area traffic: the failure of Poisson modeling , 1995, TNET.

[2]  Miklós Telek,et al.  Markovian Modeling of Real Data Traffic: Heuristic Phase Type and MAP Fitting of Heavy Tailed and Fractal Like Samples , 2002, Performance.

[3]  Azer Bestavros,et al.  Self-similarity in World Wide Web traffic: evidence and possible causes , 1997, TNET.

[4]  Peter Buchholz,et al.  A Two-Step EM Algorithm for MAP Fitting , 2004, ISCIS.

[5]  Miklós Telek,et al.  Acyclic discrete phase type distributions: properties and a parameter estimation algorithm , 2003, Perform. Evaluation.

[6]  Alma Riska,et al.  An EM-based technique for approximating long-tailed data sets with PH distributions , 2004, Perform. Evaluation.

[7]  Sidney I. Resnick,et al.  Modeling Data Networks , 2002 .

[8]  Anja Feldmann,et al.  Fitting Mixtures of Exponentials to Long-Tail Distributions to Analyze Network , 1998, Perform. Evaluation.

[9]  Ren Asmussen,et al.  Fitting Phase-type Distributions via the EM Algorithm , 1996 .

[10]  A. Bobbio,et al.  A benchmark for ph estimation algorithms: results for acyclic-ph , 1994 .

[11]  Ramin Sadre,et al.  Fitting World Wide Web request traces with the EM-algorithim , 2001, SPIE ITCom.

[12]  Peter Buchholz,et al.  A novel approach for fitting probability distributions to real trace data with the EM algorithm , 2005, 2005 International Conference on Dependable Systems and Networks (DSN'05).

[13]  G. McLachlan,et al.  The EM algorithm and extensions , 1996 .

[14]  Miklós Telek,et al.  PhFit: a general phase-type fitting tool , 2002, Proceedings International Conference on Dependable Systems and Networks.

[15]  Peter Buchholz,et al.  A MAP fitting approach with independent approximation of the inter-arrival time distribution and the lag correlation , 2005, Second International Conference on the Quantitative Evaluation of Systems (QEST'05).

[16]  Yuguang Fang,et al.  Hyper-Erlang Distribution Model and its Application in Wireless Mobile Networks , 2001, Wirel. Networks.