Analyzing the EGEE Production Grid Workload: Application to Jobs Submission Optimization

Grids reliability remains an order of magnitude below clusters on production infrastructures. This work is aims at improving grid application performances by improving the job submission system. A stochastic model, capturing the behavior of a complex grid workload management system is proposed. To instantiate the model, detailed statistics are extracted from dense grid activity traces. The model is exploited in a simple job resubmission strategy. It provides quantitative inputs to improve job submission performance and it enables quantifying the impact of faults and outliers on grid operations.

[1]  Johan Montagnat,et al.  A Probabilistic Model to Analyse Workflow Performance on Production Grids , 2008, 2008 Eighth IEEE International Symposium on Cluster Computing and the Grid (CCGRID).

[2]  Dror G. Feitelson,et al.  Workload Modeling for Performance Evaluation , 2002, Performance.

[3]  Tristan Glatard,et al.  Optimizing jobs timeouts on clusters and production grids , 2007, Seventh IEEE International Symposium on Cluster Computing and the Grid (CCGrid '07).

[4]  Hui Li,et al.  Workload Characteristics of a Multi-cluster Supercomputer , 2004, JSSPP.

[5]  Johan Montagnat,et al.  Modeling user submission strategies on production grids , 2009, HPDC '09.

[6]  Uwe Schwiegelshohn,et al.  New Challenges of Parallel Job Scheduling , 2007, JSSPP.

[7]  Tristan Glatard,et al.  ESTIMATING THE EXECUTION CONTEXT FOR REFINING SUBMISSION STRATEGIES ON PRODUCTION GRIDS , 2007 .

[8]  Henri Casanova,et al.  SimGrid: A Generic Framework for Large-Scale Distributed Experiments , 2008, Tenth International Conference on Computer Modeling and Simulation (uksim 2008).

[9]  Emmanouel A. Varvarigos,et al.  Statistical Analysis and Modeling of Jobs in a Grid Environment , 2007, Journal of Grid Computing.

[10]  Charles Loomis,et al.  Scheduling for Responsive Grids , 2008, Journal of Grid Computing.

[11]  Alexandru Iosup,et al.  The Grid Workloads Archive , 2008, Future Gener. Comput. Syst..

[12]  Emmanuel Medernach,et al.  Workload Analysis of a Cluster in a Grid Environment , 2005, JSSPP.