A novel approach for fitting probability distributions to real trace data with the EM algorithm

The representation of general distributions or measured data by phase-type distributions is an important and non-trivial task in analytical modeling. Although a large number of different methods for fitting parameters of phase-type distributions to data traces exist, many approaches lack efficiency and numerical stability. In this paper, a novel approach is presented that fits a restricted class of phase-type distributions, namely mixtures of Erlang distributions, to trace data. For the parameter fitting an algorithm of the expectation maximization type is developed. The paper shows that these choices result in a very efficient and numerically stable approach which yields phase-type approximations for a wide range of data traces that are as good or better than approximations computed with other less efficient and less stable fitting methods. To illustrate the effectiveness of the proposed fitting algorithm, we present comparative results for our approach and two other methods using six benchmark traces and two real traffic traces.

[1]  M. A. Johnson,et al.  Selecting Parameters of Phase Distributions: Combining Nonlinear Programming, Heuristics, and Erlang Distributions , 1993, INFORMS J. Comput..

[2]  Ren Asmussen,et al.  Fitting Phase-type Distributions via the EM Algorithm , 1996 .

[3]  A. Bobbio,et al.  A benchmark for ph estimation algorithms: results for acyclic-ph , 1994 .

[4]  Peter Buchholz,et al.  An EM-Algorithm for MAP Fitting from Real Traffic Data , 2003, Computer Performance Evaluation / TOOLS.

[5]  Ramin Sadre,et al.  Fitting World Wide Web request traces with the EM-algorithim , 2001, SPIE ITCom.

[6]  Miklós Telek,et al.  Acyclic discrete phase type distributions: properties and a parameter estimation algorithm , 2003, Perform. Evaluation.

[7]  Kishor S. Trivedi,et al.  Probability and Statistics with Reliability, Queuing and Computer Science Applications, Second Edition , 2002 .

[8]  Alma Riska,et al.  An EM-based technique for approximating long-tailed data sets with PH distributions , 2004, Perform. Evaluation.

[9]  Avishai Mandelbaum,et al.  Empirical analysis of a call center , 2000 .

[10]  L. Schmickler Meda: mixed erlang distributions as phase-type representations of empirical distribution functions. , 1992 .

[11]  G. McLachlan,et al.  The EM algorithm and extensions , 1996 .

[12]  Miklós Telek,et al.  Markovian Modeling of Real Data Traffic: Heuristic Phase Type and MAP Fitting of Heavy Tailed and Fractal Like Samples , 2002, Performance.

[13]  Kishor S. Trivedi Probability and Statistics with Reliability, Queuing, and Computer Science Applications , 1984 .

[14]  Anja Feldmann,et al.  Fitting mixtures of exponentials to long-tail distributions to analyze network performance models , 1997, Proceedings of INFOCOM '97.

[15]  A. Cumani On the canonical representation of homogeneous markov processes modelling failure - time distributions , 1982 .

[16]  Yuguang Fang,et al.  Hyper-Erlang Distribution Model and its Application in Wireless Mobile Networks , 2001, Wirel. Networks.

[17]  Miklós Telek,et al.  PhFit: A General Phase-Type Fitting Tool , 2002, Computer Performance Evaluation / TOOLS.

[18]  Anja Feldmann,et al.  Fitting Mixtures of Exponentials to Long-Tail Distributions to Analyze Network , 1998, Perform. Evaluation.

[19]  Miklós Telek,et al.  PhFit: a general phase-type fitting tool , 2002, Proceedings International Conference on Dependable Systems and Networks.

[20]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .