Dynamic balancing of communication and computation load for HLA-based simulations on large-scale distributed systems

Dynamic balancing of computation and communication load is vital for the execution stability and performance of distributed, parallel simulations deployed on the shared, unreliable resources of large-scale environments. High Level Architecture (HLA) based simulations can experience a decrease in performance due to imbalances that are produced initially and/or during run time. These imbalances are generated by the dynamic load changes of distributed simulations or by unknown, non-managed background processes resulting from the non-dedication of shared resources. Due to the dynamic execution characteristics of elements that compose distributed applications, the computational load and interaction dependencies of each simulation entity change during run time. These dynamic changes lead to an irregular load and communication distribution, which increases overhead of resources and latencies. A static partitioning of load is limited to deterministic applications and is incapable of predicting the dynamic changes caused by distributed applications or by external background processes. Therefore, a scheme for balancing the communication and computational load during the execution of distributed simulations is devised in a scalable hierarchical architecture. The proposed balancing system employs local and cluster monitoring mechanisms in order to observe the distributed load changes and identify imbalances, repartitioning policies to determine a distribution of load and minimize imbalances. A migration technique is also employed by this proposed balancing system to perform reliable and low-latency load transfers. Such a system successfully improves the use of shared resources and increases distributed simulations' performance by minimizing communication latencies and partitioning the load evenly. Experiments and comparative analyses were conducted in order to identify the gains that the proposed balancing scheme provides to large-scale distributed simulations.

[1]  Jed Marti,et al.  Load Balancing Strategies for Time Warp on Multi-User Workstations , 1993, Comput. J..

[2]  Stephen John Turner,et al.  Load balancing for conservative simulation on shared memory multiprocessor systems , 2000, Proceedings Fourteenth Workshop on Parallel and Distributed Simulation.

[3]  E. Deelman,et al.  Dynamic load balancing in parallel discrete event simulation for spatially explicit problems , 1998, Proceedings. Twelfth Workshop on Parallel and Distributed Simulation PADS '98 (Cat. No.98TB100233).

[4]  Wentong Cai,et al.  Federate Migration in a Service Oriented HLA RTI , 2007, 11th IEEE International Symposium on Distributed Simulation and Real-Time Applications (DS-RT'07).

[5]  Johannes Lüthi,et al.  The resource sharing system: dynamic federate mapping for HLA-based distributed simulation , 2001, Proceedings 15th Workshop on Parallel and Distributed Simulation.

[6]  Azzedine Boukerche An adaptive partitioning algorithm for conservative parallel simulation , 2001, Proceedings 15th International Parallel and Distributed Processing Symposium. IPDPS 2001.

[7]  Sajal K. Das,et al.  Dynamic load balancing strategies for conservative parallel simulations , 1997, Proceedings 11th Workshop on Parallel and Distributed Simulation.

[8]  John G. Cleary,et al.  Scheduling critical channels in conservative parallel discrete event simulation , 1999, Proceedings Thirteenth Workshop on Parallel and Distributed Simulation. PADS 99. (Cat. No.PR00155).

[9]  Stephen John Turner,et al.  A load management system for running HLA-based distributed simulations over the grid , 2002, Proceedings. Sixth IEEE International Workshop on Distributed Simulation and Real-Time Applications.

[10]  Marian Bubak,et al.  Towards a grid management system for HLA-based interactive simulations , 2003, Proceedings Seventh IEEE International Symposium on Distributed Simulation and Real-Time Applications.

[11]  Gary S. H. Tan,et al.  Load Distribution Services in HLA , 2004, Eighth IEEE International Symposium on Distributed Simulation and Real-Time Applications.

[12]  Wei Shen,et al.  Experiments in load migration and dynamic load balancing in SPEEDES , 1998, 1998 Winter Simulation Conference. Proceedings (Cat. No.98CH36274).

[13]  Sajal K. Das,et al.  Dynamic load balancing strategies for conservative parallel simulations , 1997, Workshop on Parallel and Distributed Simulation.

[14]  Azzedine Boukerche,et al.  Optimized Federate Migration for Large-Scale HLA-Based Simulations , 2008, 2008 12th IEEE/ACM International Symposium on Distributed Simulation and Real-Time Applications.

[15]  Malcolm Yoke-Hean Low Dynamic load-balancing for BSP Time Warp , 2002, Proceedings 35th Annual Simulation Symposium. SS 2002.

[16]  Azzedine Boukerche,et al.  A static partitioning and mapping algorithm for conservative parallel simulations , 1994, PADS '94.

[17]  H. Avril,et al.  The Dynamic Load Balancing of Clustered Time Warp for Logic Simulation , 1996, Proceedings of Symposium on Parallel and Distributed Tools.

[18]  Ian T. Foster The Anatomy of the Grid: Enabling Scalable Virtual Organizations , 2001, Euro-Par.

[19]  Peter Martini,et al.  A Flexible Dynamic Partitioning Algorithm for Optimistic Distributed Simulation , 2007, 21st International Workshop on Principles of Advanced and Distributed Simulation (PADS'07).

[20]  R. Fujimoto,et al.  Background Execution of Time Warp Programs , 1996, Proceedings of Symposium on Parallel and Distributed Tools.

[21]  Luciano Bononi,et al.  A New Adaptive Middleware for Parallel and Distributed Simulation of Dynamically Interacting Systems , 2004, Eighth IEEE International Symposium on Distributed Simulation and Real-Time Applications.

[22]  Sajal K. Das,et al.  Dynamic load balancing strategies for conservative parallel simulations , 1997 .

[23]  IEEE Standard for Modeling and Simulation (M&S) High Level Architecture (HLA) — Framework and Rules , 2001 .

[24]  Ian T. Foster,et al.  Grid Services for Distributed System Integration , 2002, Computer.

[25]  Herbert Bauer,et al.  Dynamic load balancing of a multi-cluster simulator on a network of workstations , 1995, PADS.

[26]  Ian T. Foster,et al.  The anatomy of the grid: enabling scalable virtual organizations , 2001, Proceedings First IEEE/ACM International Symposium on Cluster Computing and the Grid.

[27]  Jarek Nabrzyski,et al.  Grid resource management: state of the art and future trends , 2004 .

[28]  David E. Culler,et al.  The ganglia distributed monitoring system: design, implementation, and experience , 2004, Parallel Comput..

[29]  Azzedine Boukerche,et al.  An Efficient Dynamic Load Balancing Scheme for Distributed Simulations on a Grid Infrastructure , 2008, 2008 12th IEEE/ACM International Symposium on Distributed Simulation and Real-Time Applications.

[30]  Carl Tropper,et al.  On learning algorithms and balancing loads in Time Warp , 1999, Proceedings Thirteenth Workshop on Parallel and Distributed Simulation. PADS 99. (Cat. No.PR00155).

[31]  Carl Tropper,et al.  On Process Migration and Load Balancing in Time Warp , 1993, IEEE Trans. Parallel Distributed Syst..