A hybrid Markov chain modeling architecture for workload on parallel computers

This paper proposes a comprehensive modeling architecture for workloads on parallel computers using Markov chains in combination with state dependent empirical distribution functions. This hybrid approach is based on the requirements of scheduling algorithms: the model considers the four essential job attributes submission time, number of required processors, estimated processing time, and actual processing time. To assess the goodness-of-fit of a workload model the similarity of sequences of real jobs and jobs generated from the model needs to be captured. We propose to reduce the complexity of this task and to evaluate the model by comparing the results of a widely-used scheduling algorithm instead. This approach is demonstrated with commonly used scheduling objectives like the Average Weighted Response Time and total Utilization. We compare their outcomes on the simulated workload traces from our model with those of an original workload trace from a real Massively Parallel Processing system installation. To verify this new evaluation technique, standard criteria for assessing the goodness-of-fit for workload models are additionally applied.

[1]  Dror G. Feitelson,et al.  Packing Schemes for Gang Scheduling , 1996, JSSPP.

[2]  Allen B. Downey,et al.  The elusive goal of workload characterization , 1999, PERV.

[3]  Fang Wang,et al.  Modeling of Workload in MPPs , 1997, JSSPP.

[4]  Michael Muskulus,et al.  Modeling correlated workloads by combining model based clustering and a localized sampling algorithm , 2007, ICS '07.

[5]  Steven Hotovy,et al.  Workload Evolution on the Cornell Theory Center IBM SP2 , 1996, JSSPP.

[6]  Eric R. Ziegel Encyclopedia of Statistical Sciences, Update Volume I , 1998 .

[7]  Mark S. Squillante,et al.  The impact of job arrival patterns on parallel scheduling , 1999, PERV.

[8]  Uwe Schwiegelshohn Preemptive Weighted Completion Time Scheduling of Parallel Jobs , 1996, ESA.

[9]  Carsten Franke,et al.  On Advantages of Scheduling Using Genetic Fuzzy Systems , 2006, JSSPP.

[10]  Dan Tsafrir,et al.  Modeling User Runtime Estimates , 2005, JSSPP.

[11]  M. H. Eggar,et al.  Validity of fitting a first-order Markov chain model to data , 2002 .

[12]  Dan Tsafrir,et al.  Workload sanitation for performance evaluation , 2006, 2006 IEEE International Symposium on Performance Analysis of Systems and Software.

[13]  Franklin A. Graybill,et al.  Introduction to The theory , 1974 .

[14]  Michael Muskulus,et al.  Analysis and modeling of job arrivals in a production grid , 2007, PERV.

[15]  Ramin Yahyapour,et al.  Scaling of Workload Traces , 2003, JSSPP.

[16]  Carsten Franke,et al.  Greedy Scheduling with Complex Obejectives , 2007, 2007 IEEE Symposium on Computational Intelligence in Scheduling.

[17]  Dror G. Feitelson,et al.  The workload on parallel supercomputers: modeling the characteristics of rigid jobs , 2003, J. Parallel Distributed Comput..

[18]  M. Bartlett The frequency goodness of fit test for probability chains , 1951, Mathematical Proceedings of the Cambridge Philosophical Society.

[19]  Uwe Schwiegelshohn,et al.  Fairness in parallel job scheduling , 2000 .

[20]  Christian Grimme,et al.  Prospects of Collaboration between Compute Providers by Means of Job Interchange , 2007, JSSPP.

[21]  Dror G. Feitelson,et al.  Utilization and Predictability in Scheduling the IBM SP2 with Backfilling , 1998, Proceedings of the First Merged International Parallel Processing Symposium and Symposium on Parallel and Distributed Processing.

[22]  Shikharesh Majumdar,et al.  Parallel Job Scheduling: A Performance Perspective , 2000, Performance Evaluation.

[23]  Cynthia Bailey Lee,et al.  Are User Runtime Estimates Inherently Inaccurate? , 2004, JSSPP.

[24]  Paul G. Hoel,et al.  A TEST FOR MARKOFF CHAINS , 1954 .

[25]  Dror G. Feitelson,et al.  Metrics for Parallel Job Scheduling and Their Convergence , 2001, JSSPP.

[26]  Dror G. Feitelson,et al.  Workload Modeling for Performance Evaluation , 2002, Performance.

[27]  Ramin Yahyapour,et al.  Parallel Computer Workload Modeling with Markov Chains , 2004, JSSPP.

[28]  Jens Mache,et al.  A Comparative Study of Real Workload Traces and Synthetic Workload Models for Parallel Job Scheduling , 1998, JSSPP.

[29]  Franklin A. Graybill,et al.  Introduction to the Theory of Statistics, 3rd ed. , 1974 .

[30]  Ramin Yahyapour,et al.  Modelling of Parameters in Supercomputer Workloads , 2004, ARCS Workshops.