Simbatch: An API for Simulating and Predicting the Performance of Parallel Resources Managed by Batch Systems

In this paper, we describe Simbatch, an API which offers core functionalities to realistically simulate parallel resources and batch reservation systems. The objective is twofold: proposing at the same time a tool to efficiently predict parallel resources usage based on their simulations, and to realistically study Grid scheduling heuristics that may be embedded in a Grid middleware or in a tool that deploys it. Indeed, such predictions can be used in a Grid middleware both for scheduling purposes, and to dynamically tune moldable applications in function of the load of the chosen parallel resource in place of the Grid user. Simbatch simulation experiments show an average error rate under 2% compared to real life experiments conducted with the OAR batch manager.

[1]  Kurt Stockinger,et al.  OptorSim-A Grid Simulator for Studying Dynamic Data Replication Strategies , 2003 .

[2]  Francine Berman,et al.  Using Moldability to Improve the Performance of Supercomputer Jobs , 2002, J. Parallel Distributed Comput..

[3]  Pierre Ramet,et al.  Tunable Parallel Experiments in a GridRPC Framework: Application to Linear Solvers , 2008, VECPAR.

[4]  Henri Casanova,et al.  Scheduling distributed applications: the SimGrid simulation framework , 2003, CCGrid 2003. 3rd IEEE/ACM International Symposium on Cluster Computing and the Grid, 2003. Proceedings..

[5]  Y. Caniou,et al.  Implementing Interoperability between the AEGIS and DIET GridRPC Middleware to Build an International Sparse Linear Algebra Expert System , 2008, 2008 The Second International Conference on Advanced Engineering Computing and Applications in Sciences.

[6]  Yoshio Suzuki,et al.  Toward an international sparse linear algebra expert system by interconnecting the ITBL computational Grid with the Grid-TLSE platform , 2008, HiPC 2008.

[7]  Eddy Caron,et al.  On Deploying Scientific Software within the Grid-TLSE Project , 2005 .

[8]  Henri Casanova,et al.  NetSovle: A Network Server for Solving Computational Science Problems , 1996, Proceedings of the 1996 ACM/IEEE Conference on Supercomputing.

[9]  Jean-Marc Nicod,et al.  Une approche hiérarchique des serveurs de calculs , 2002 .

[10]  Franck Cappello,et al.  Grid'5000: a large scale and highly reconfigurable grid experimental testbed , 2005, The 6th IEEE/ACM International Workshop on Grid Computing, 2005..

[11]  N. Brook,et al.  DIRAC - Distributed Infrastructure with Remote Agent Control , 2003, ArXiv.

[12]  Francine Berman,et al.  A study of deadline scheduling for client-server systems on the Computational Grid , 2001, Proceedings 10th IEEE International Symposium on High Performance Distributed Computing.

[13]  Georges Da Costa,et al.  2005 IEEE International Symposium on Cluster Computing and the Grid , 2005, CCGRID.

[14]  Henri Casanova,et al.  Simgrid: a toolkit for the simulation of application scheduling , 2001, Proceedings First IEEE/ACM International Symposium on Cluster Computing and the Grid.

[15]  Rajkumar Buyya,et al.  A toolkit for modelling and simulating data Grids: an extension to GridSim , 2008 .

[16]  Dror G. Feitelson,et al.  Utilization, Predictability, Workloads, and User Runtime Estimates in Scheduling the IBM SP2 with Backfilling , 2001, IEEE Trans. Parallel Distributed Syst..

[17]  Kavitha Ranganathan,et al.  Decoupling computation and data scheduling in distributed data-intensive applications , 2002, Proceedings 11th IEEE International Symposium on High Performance Distributed Computing.

[18]  Franck Cappello,et al.  Grid'5000: a large scale, reconfigurable, controlable and monitorable Grid platform , 2005 .