Scheduling distributed applications: the SimGrid simulation framework

Since the advent of distributed computer systems an active field of research has been the investigation of scheduling strategies for parallel applications. The common approach is to employ scheduling heuristics that approximate an optimal schedule. Unfortunately, it is often impossible to obtain analytical results to compare the efficacy of these heuristics. One possibility is to conducts large numbers of back-to-back experiments on real platforms. While this is possible on tightly-coupled platforms, it is infeasible on modern distributed platforms (i.e. Grids) as it is labor-intensive and does not enable repeatable results. The solution is to resort to simulations. Simulations not only enables repeatable results but also make it possible to explore wide ranges of platform and application scenarios. In this paper we present the SimGrid framework which enables the simulation of distributed applications in distributed computing environments for the specific purpose of developing and evaluating scheduling algorithms. This paper focuses on SimGrid v2, which greatly improves on the first version of the software with more realistic network models and topologies. SimGrid v2 also enables the simulation of distributed scheduling agents, which has become critical for current scheduling research in large-scale platforms. After describing and validating these features, we present a case study by which we demonstrate the usefulness of SimGrid for conducting scheduling research.

[1]  Matthew Doar,et al.  A better model for generating test networks , 1996, Proceedings of GLOBECOM'96. 1996 IEEE Global Telecommunications Conference.

[2]  Jeffrey D. Ullman,et al.  NP-Complete Scheduling Problems , 1975, J. Comput. Syst. Sci..

[3]  Henri Casanova,et al.  UMR: a multi-round algorithm for scheduling divisible workloads , 2003, Proceedings International Parallel and Distributed Processing Symposium.

[4]  Ladislau Bölöni,et al.  A Comparison of Eleven Static Heuristics for Mapping a Class of Independent Tasks onto Heterogeneous Distributed Computing Systems , 2001, J. Parallel Distributed Comput..

[5]  Arjen K. Lenstra,et al.  A World Wide Number Field Sieve Factoring Record: On to 512 Bits , 1996, ASIACRYPT.

[6]  Ian T. Foster,et al.  The Anatomy of the Grid: Enabling Scalable Virtual Organizations , 2001, Int. J. High Perform. Comput. Appl..

[7]  Herb Schwetman,et al.  CSIM: a C-based process-oriented simulation language , 1986, WSC '86.

[8]  Yves Robert,et al.  Optimal algorithms for scheduling divisible workloads on heterogeneous systems , 2003, Proceedings International Parallel and Distributed Processing Symposium.

[9]  Dah-Ming Chiu,et al.  Some observations on fairness of bandwidth sharing , 2000, Proceedings ISCC 2000. Fifth IEEE Symposium on Computers and Communications.

[10]  Ibrahim Matta,et al.  Universal Topology Generation from a User ’ s Perspective , 2001 .

[11]  Francine Berman,et al.  Using Effective Network Views to Promote Distributed Application Performance , 1999, PDPTA.

[12]  Matthew Mathis,et al.  The macroscopic behavior of the TCP congestion avoidance algorithm , 1997, CCRV.

[13]  Laurent Massoulié,et al.  Bandwidth sharing: objectives and algorithms , 2002, TNET.

[14]  Kenneth L. Calvert,et al.  Modeling Internet topology , 1997, IEEE Commun. Mag..

[15]  Marianne Winslett,et al.  Performance Modeling for the Panda Array I/O Library , 1996, Proceedings of the 1996 ACM/IEEE Conference on Supercomputing.

[16]  Sugih Jamin,et al.  Inet-3.0: Internet Topology Generator , 2002 .

[17]  Henri Casanova,et al.  A Network Model for Simulation of Grid Application , 2002 .

[18]  Francine Berman,et al.  Heuristics for scheduling parameter sweep applications in grid environments , 2000, Proceedings 9th Heterogeneous Computing Workshop (HCW 2000) (Cat. No.PR00556).

[19]  Sally Floyd,et al.  Promoting the use of end-to-end congestion control in the Internet , 1999, TNET.

[20]  Ian Foster,et al.  The Grid 2 - Blueprint for a New Computing Infrastructure, Second Edition , 1998, The Grid 2, 2nd Edition.

[21]  Paul A. Fishwick,et al.  SimPack: getting started with simulation programming in C and C++ , 1992, WSC '92.

[22]  Ibrahim Matta,et al.  On the origin of power laws in Internet topologies , 2000, CCRV.

[23]  Henri Casanova,et al.  DAG SCHEDULING ALGORITHMS FOR ENTITY-LEVEL SIMULATIONS , 2002 .

[24]  Michalis Faloutsos,et al.  On power-law relationships of the Internet topology , 1999, SIGCOMM '99.

[25]  Ishfaq Ahmad,et al.  Benchmarking and Comparison of the Task Graph Scheduling Algorithms , 1999, J. Parallel Distributed Comput..

[26]  Anukool Lakhina,et al.  BRITE: Universal Topology Generation from a User''s Perspective , 2001 .

[27]  John A. Miller,et al.  JSIM: A Java-based simulation and animation environment , 1997, Proceedings of 1997 SCS Simulation Multiconference.

[28]  Andrew A. Chien,et al.  The MicroGrid: a Scientific Tool for Modeling Computational Grids , 2000, ACM/IEEE SC 2000 Conference (SC'00).

[29]  Satoshi Matsuoka,et al.  Overview of a performance evaluation system for global computing scheduling algorithms , 1999, Proceedings. The Eighth International Symposium on High Performance Distributed Computing (Cat. No.99TH8469).

[30]  Satish Kumar,et al.  Improving Simulation for Network Research , 1999 .

[31]  Fabian Gomes,et al.  SimKit: a high performance logical process simulation class library in C++ , 1995, WSC '95.

[32]  Henri Casanova,et al.  Simgrid: a toolkit for the simulation of application scheduling , 2001, Proceedings First IEEE/ACM International Symposium on Cluster Computing and the Grid.

[33]  Jason Maassen,et al.  Programming environments for high-performance Grid computing: the Albatross project , 2002, Future Gener. Comput. Syst..

[34]  Francine Berman,et al.  Performance modeling for entity-level simulations , 2003, Proceedings International Parallel and Distributed Processing Symposium.

[35]  Laurent Massoulié,et al.  Impact of fairness on Internet performance , 2001, SIGMETRICS '01.

[36]  Larry Carter,et al.  Bandwidth-centric allocation of independent tasks on heterogeneous platforms , 2002, Proceedings 16th International Parallel and Distributed Processing Symposium.

[37]  David M. Nicol,et al.  A distributed memory LAPSE: parallel simulation of message-passing programs , 1994, PADS '94.

[38]  Ami Marowka,et al.  The GRID: Blueprint for a New Computing Infrastructure , 2000, Parallel Distributed Comput. Pract..

[39]  MedinaAlberto,et al.  On the origin of power laws in Internet topologies , 2000 .

[40]  Francine Berman,et al.  Resource Allocation for Steerable Parallel Parameter Searches , 2002, GRID.

[41]  Rajkumar Buyya,et al.  GridSim: a toolkit for the modeling and simulation of distributed resource management and scheduling for Grid computing , 2002, Concurr. Comput. Pract. Exp..