Realistic Workload Modeling and Its Performance Impacts in Large-Scale eScience Grids

Grid computing proves to be a successful paradigm for large-scale distributed data processing, and global eScience grids have been in production for years (e.g., LCG and OSG). The majority of applications running on these production environments can be characterized as massive CPU-intensive batch jobs (or ¿bag-of-tasks¿), sometimes considered as the ¿killer¿ application for the grid. A deep understanding of its main workload characteristics is not only necessary for realistic performance evaluation of the existing system, but also crucial to generate new insights into better resource allocation schemes. This paper presents a comprehensive statistical analysis of the workloads on production eScience grid environments. We focus on second-order statistics and the scaling behavior of main job characteristics, namely job arrivals and job runtimes. A range of autocorrelation structures is identified and analyzed, including pseudoperiodicity, short-range dependence (SRD), and long-range dependence (LRD). We further develop mathematical models that are able to capture these salient properties in the workloads. Workload models, in turn, enable us to quantitatively evaluate the performance impacts of autocorrelations in grid scheduling. The results indicate that autocorrelations in workloads result in system performance degradation, sometimes the difference can be as large as up to several orders of magnitude. Nevertheless, better performance can be achieved at the grid level under bursty local background workloads. Such effects of workloads on systems are extensively analyzed and explained.

[1]  Hui Li,et al.  Workload Characteristics of a Multi-cluster Supercomputer , 2004, JSSPP.

[2]  Richard G. Baraniuk,et al.  A Multifractal Wavelet Model with Application to Network Traffic , 1999, IEEE Trans. Inf. Theory.

[3]  Qi Zhang,et al.  Load Unbalancing to Improve Performance under Autocorrelated Traffic , 2006, 26th IEEE International Conference on Distributed Computing Systems (ICDCS'06).

[4]  Patrice Abry,et al.  Wavelets for the Analysis, Estimation, and Synthesis of Scaling Data , 2002 .

[5]  Richard G. Baraniuk,et al.  Multiscale nature of network traffic , 2002, IEEE Signal Process. Mag..

[6]  Hui Li,et al.  Towards A Better Understanding of Workload Dynamics on Data-Intensive Clusters and Grids , 2007, 2007 IEEE International Parallel and Distributed Processing Symposium.

[7]  Jeffrey D. Scargle,et al.  Fractal-Based Point Processes , 2007, Technometrics.

[8]  Adrian E. Raftery,et al.  Model-Based Clustering, Discriminant Analysis, and Density Estimation , 2002 .

[9]  Patrice Abry,et al.  A Wavelet-Based Joint Estimator of the Parameters of Long-Range Dependence , 1999, IEEE Trans. Inf. Theory.

[10]  Hans G. Feichtinger,et al.  Analysis, Synthesis, and Estimation of Fractal-Rate Stochastic Point Processes , 1997, adap-org/9709006.

[11]  Stephen A. Jarvis,et al.  Mapping DAG-based applications to multiclusters with background workload , 2005, CCGrid 2005. IEEE International Symposium on Cluster Computing and the Grid, 2005..

[12]  Hui Li,et al.  Workload characterization, modeling, and prediction in grid Computing , 2008 .

[13]  Rizos Sakellariou,et al.  Scheduling Data-IntensiveWorkflows onto Storage-Constrained Distributed Resources , 2007, Seventh IEEE International Symposium on Cluster Computing and the Grid (CCGrid '07).

[14]  Kavitha Ranganathan,et al.  Decoupling computation and data scheduling in distributed data-intensive applications , 2002, Proceedings 11th IEEE International Symposium on High Performance Distributed Computing.

[15]  Anja Feldmann,et al.  Data networks as cascades: investigating the multifractal nature of Internet WAN traffic , 1998, SIGCOMM '98.

[16]  Stéphane Mallat,et al.  Matching pursuits with time-frequency dictionaries , 1993, IEEE Trans. Signal Process..

[17]  Peter J. Denning,et al.  The locality principle , 2005, CACM.

[18]  Anca I. D. Bucur,et al.  Trace-based simulations of processor co-allocation policies in multiclusters , 2003, High Performance Distributed Computing, 2003. Proceedings. 12th IEEE International Symposium on.

[19]  Dror G. Feitelson,et al.  Workload Modeling for Performance Evaluation , 2002, Performance.

[20]  Francine Berman,et al.  Heuristics for scheduling parameter sweep applications in grid environments , 2000, Proceedings 9th Heterogeneous Computing Workshop (HCW 2000) (Cat. No.PR00556).

[21]  Emmanuel Medernach,et al.  Workload Analysis of a Cluster in a Grid Environment , 2005, JSSPP.

[22]  Francine Berman,et al.  A comprehensive model of the supercomputer workload , 2001 .

[23]  Ian T. Foster,et al.  DI-GRUBER: A Distributed Approach to Grid Resource Brokering , 2005, ACM/IEEE SC 2005 Conference (SC'05).

[24]  David Abramson,et al.  Scheduling parameter sweep applications on global Grids: a deadline and budget constrained cost–time optimization algorithm , 2005, Softw. Pract. Exp..

[25]  Mark S. Squillante,et al.  The impact of job arrival patterns on parallel scheduling , 1999, PERV.

[26]  Richard Baraniuk,et al.  The Multiscale Nature of Network Traffic: Discovery, Analysis, and Modelling , 2003 .

[27]  Shanshan Song,et al.  Trusted Grid Computing with Security Binding and Trust Integration , 2005, Journal of Grid Computing.

[28]  Dror G. Feitelson,et al.  The workload on parallel supercomputers: modeling the characteristics of rigid jobs , 2003, J. Parallel Distributed Comput..

[29]  Walter Willinger,et al.  On the self-similar nature of Ethernet traffic , 1993, SIGCOMM '93.

[30]  Ramin Yahyapour,et al.  Parallel Computer Workload Modeling with Markov Chains , 2004, JSSPP.

[31]  Wolfgang Fischer,et al.  The Markov-Modulated Poisson Process (MMPP) Cookbook , 1993, Perform. Evaluation.

[32]  Sheldon M. Ross Introduction to Probability Models. , 1995 .

[33]  Sheldon M. Ross,et al.  Introduction to Probability Models, Eighth Edition , 1972 .

[34]  R. F. Freund,et al.  Dynamic Mapping of a Class of Independent Tasks onto Heterogeneous Computing Systems , 1999, J. Parallel Distributed Comput..

[35]  Patrice Abry,et al.  Wavelet Analysis of Long-Range-Dependent Traffic , 1998, IEEE Trans. Inf. Theory.

[36]  Ian T. Foster,et al.  The anatomy of the grid: enabling scalable virtual organizations , 2001, Proceedings First IEEE/ACM International Symposium on Cluster Computing and the Grid.

[37]  Jarek Nabrzyski,et al.  Grid resource management: state of the art and future trends , 2004 .

[38]  Rajkumar Buyya,et al.  SLA-Based Cooperative Superscheduling Algorithms for Computational Grids , 2006 .

[39]  Steven B. Lowen,et al.  Fractal-Based Point Processes , 2005 .

[40]  Adrian E. Raftery,et al.  MCLUST Version 3 for R: Normal Mixture Modeling and Model-Based Clustering † , 2007 .

[41]  Ian T. Foster,et al.  Grid information services for distributed resource sharing , 2001, Proceedings 10th IEEE International Symposium on High Performance Distributed Computing.

[42]  Michael Muskulus,et al.  Modeling correlated workloads by combining model based clustering and a localized sampling algorithm , 2007, ICS '07.

[43]  Rajkumar Buyya,et al.  GridSim: a toolkit for the modeling and simulation of distributed resource management and scheduling for Grid computing , 2002, Concurr. Comput. Pract. Exp..

[44]  Jan Beran,et al.  Statistics for long-memory processes , 1994 .

[45]  Michael Muskulus,et al.  Analysis and modeling of job arrivals in a production grid , 2007, PERV.

[46]  Walter Willinger,et al.  Proof of a fundamental result in self-similar traffic modeling , 1997, CCRV.