An Online Approach to Estimate Parameters of Phase-Type Distributions

The traditional expectation-maximization (EM) algorithm is a general purpose algorithm for maximum likelihood estimation in problems with incomplete data. Several variants of the algorithm exist to estimate the parameters of phase-type distributions (PHDs), a widely used class of distributions in performance and dependability modeling. EM algorithms are typical offline algorithms because they improve the likelihood function by iteratively running through a fixed sample. Nowadays data can be generated online in most systems such that offline algorithms seem to be outdated in this environment. This paper proposes an online EM algorithm for parameter estimation of PHDs. In contrast to the offline version, the online variant adds data immediately when it becomes available and includes no iteration. Different variants of the algorithms are proposed that exploit the specific structure of subclasses of PHDs like hyperexponential, hyper-Erlang or acyclic PHDs. The algorithm furthermore incorporates current methods to detect drifts or change points in a data stream and estimates a new PHD whenever such a behavior has been identified. Thus, the resulting distributions can be applied for online model prediction and for the generation of inhomogeneous PHDs as an extension of inhomogeneous Poisson processes. Numerical experiments with artificial and measured data streams show the applicability of the approach.

[1]  T. Rydén An EM algorithm for estimation in Markov-modulated Poisson processes , 1996 .

[2]  Diane J. Cook,et al.  A survey of methods for time series change point detection , 2017, Knowledge and Information Systems.

[3]  Marcel F. Neuts,et al.  Matrix-Geometric Solutions in Stochastic Models , 1981 .

[4]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[5]  Ward Whitt,et al.  Are Call Center and Hospital Arrivals Well Modeled by Nonhomogeneous Poisson Processes? , 2014, Manuf. Serv. Oper. Manag..

[6]  Alexandru Iosup,et al.  The Failure Trace Archive: Enabling the comparison of failure measurements and models of distributed systems , 2013, J. Parallel Distributed Comput..

[7]  Peter Buchholz,et al.  A Novel Approach for Phase-Type Fitting with the EM Algorithm , 2006, IEEE Transactions on Dependable and Secure Computing.

[8]  Arthur P. Dempster,et al.  The direct use of likelihood for significance testing , 1997, Stat. Comput..

[9]  A. Aue,et al.  Structural breaks in time series , 2013 .

[10]  M. Bladt,et al.  Maximum likelihood estimation of phase-type distributions , 2011 .

[11]  O. Cappé,et al.  On‐line expectation–maximization algorithm for latent data models , 2009 .

[12]  Vaidyanathan Ramaswami,et al.  Introduction to Matrix Analytic Methods in Stochastic Modeling , 1999, ASA-SIAM Series on Statistics and Applied Mathematics.

[13]  T. Lai Sequential changepoint detection in quality control and dynamical systems , 1995 .

[14]  Miklós Telek,et al.  Matching More Than Three Moments with Acyclic Phase Type Distributions , 2007 .

[15]  A. Panchenko,et al.  An EM Algorithm for Fitting of Real Traffic Traces to PH-Distribution , 2004 .

[16]  Hassan S. Bakouch,et al.  Probability, Markov chains, queues, and simulation , 2011 .

[17]  A. Brazzale,et al.  Likelihood Asymptotics in Nonregular Settings: A Review with Emphasis on the Likelihood Ratio , 2018, Statistical Science.

[18]  Alexandru Iosup,et al.  The Failure Trace Archive: Enabling Comparative Analysis of Failures in Diverse Distributed Systems , 2010, 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing.

[19]  Jimmy Olsson,et al.  An efficient particle-based online EM algorithm for general state-space models , 2015 .

[20]  Ren Asmussen,et al.  Fitting Phase-type Distributions via the EM Algorithm , 1996 .

[21]  Peter W. Glynn,et al.  Fitting continuous piecewise linear poisson intensities via maximum likelihood and least squares , 2017, 2017 Winter Simulation Conference (WSC).

[22]  A. Cumani On the canonical representation of homogeneous markov processes modelling failure - time distributions , 1982 .

[23]  Maurits Kaptein,et al.  Maximum likelihood estimation of a finite mixture of logistic regression models in a continuous data stream , 2018, ArXiv.

[24]  H. Kushner,et al.  Stochastic Approximation and Recursive Algorithms and Applications , 2003 .

[25]  Olivier Capp'e,et al.  Online Expectation Maximisation , 2010, 1011.1745.

[26]  Evgenia Smirni,et al.  Trace data characterization and fitting for Markov modeling , 2010, Performance evaluation (Print).

[27]  Olivier Capp'e Online EM Algorithm for Hidden Markov Models , 2009, 0908.2359.

[28]  Kishor S. Trivedi,et al.  Numerical transient analysis of markov models , 1988, Comput. Oper. Res..

[29]  Alexandru Iosup,et al.  On the dynamic resource availability in grids , 2007, 2007 8th IEEE/ACM International Conference on Grid Computing.

[30]  S. Asmussen,et al.  Applied Probability and Queues , 1989 .

[31]  Alma Riska,et al.  An EM-based technique for approximating long-tailed data sets with PH distributions , 2004, Perform. Evaluation.

[32]  Panlop Zeephongsekul,et al.  Performance Optimized Expectation Conditional Maximization Algorithms for Nonhomogeneous Poisson Process Software Reliability Models , 2017, IEEE Transactions on Reliability.

[33]  Yang Cao,et al.  On Near Optimality of One-sample Update for Joint Detection and Estimation , 2017, 1705.06995.

[34]  Dan Klein,et al.  Online EM for Unsupervised Models , 2009, NAACL.

[35]  Peter Buchholz,et al.  Input Modeling with Phase-Type Distributions and Markov Models: Theory and Applications , 2014 .

[36]  Ramin Sadre,et al.  Fitting World Wide Web request traces with the EM-algorithim , 2001, SPIE ITCom.

[37]  Tadashi Dohi,et al.  A refined EM algorithm for PH distributions , 2011, Perform. Evaluation.