Master/worker parallel discrete event simulation

Recent advances in metacomputing such as volunteer and desktop grid computing that aggregate loosely coupled resources have transformed the execution of certain computational workloads that, in the past, were reserved for processing on dedicated clusters. Parallel discrete event simulations have different requirements than programs that can readily exploit loosely coupled resources such as embarrassingly parallel codes. Consequently, parallel discrete event simulations are typically run on tightly coupled machines providing the best opportunity for maximum speedup. However, these facilities may not be readily available to many users. The focus of this thesis explores the merging of these distinct computational domains involving the execution of parallel discrete event simulation across loosely coupled resources. A master/worker architecture for parallel discrete event simulation is proposed providing robust executions under a dynamic set of services with system-level support for fault tolerance, semi-automated client-directed load balancing, portability across heterogeneous machines, and the ability to run codes on idle or time-sharing clients without significant interaction by users. Results indicate that a master/worker approach utilizing loosely coupled resources is a viable means for high throughput parallel discrete event simulation by enhancing existing computational capacity or providing alternate execution capability for less time-critical codes. Research questions and challenges associated with issues and limitations with the work distribution paradigm, targeted computational domain, performance metrics, and the intended class of applications to be used in this context are analyzed and discussed. A portable web services approach to master/worker parallel discrete event simulation is proposed and evaluated. Optimizations to increase the efficiency of large-scale simulation execution through distributed master service design and intrinsic overhead reduction are proposed and evaluated. Finally, challenges for optimistic parallel discrete event simulation such as rollbacks and message unsending with an inherently different computation paradigm utilizing master services and time windows are addressed and evaluated.

[1]  Sajal K. Das,et al.  Dynamic load balancing strategies for conservative parallel simulations , 1997, Proceedings 11th Workshop on Parallel and Distributed Simulation.

[2]  Ian Stokes-Rees,et al.  DIRAC: a scalable lightweight architecture for high throughput computing , 2004, Fifth IEEE/ACM International Workshop on Grid Computing.

[3]  Kalyan S. Perumalla Scaling time warp-based discrete event execution to 104 processors on a Blue Gene supercomputer , 2007, CF '07.

[4]  Pu Liu,et al.  Toward characterizing the performance of SOAP toolkits , 2004, Fifth IEEE/ACM International Workshop on Grid Computing.

[5]  David A. Kramer,et al.  Utilization of a local grid of Mac OS X-based computers using Xgrid , 2004, Proceedings. 13th IEEE International Symposium on High performance Distributed Computing, 2004..

[6]  Eric Darve,et al.  N-Body simulation on GPUs , 2006, SC.

[7]  Fabio Kon,et al.  InteGrade: object‐oriented Grid middleware leveraging the idle computing power of desktop machines , 2004, Concurr. Pract. Exp..

[8]  Boris D. Lubachevsky,et al.  Efficient distributed event-driven simulations of multiple-loop networks , 1988, CACM.

[9]  P. Koopman,et al.  A widely deployable Web-based network simulation framework using CORBA IDL-based APIs , 1999, WSC'99. 1999 Winter Simulation Conference Proceedings. 'Simulation - A Bridge to the Future' (Cat. No.99CH37038).

[10]  Richard M. Fujimoto,et al.  Optimistic Parallel Simulation over Public Resource-Computing Infrastructures and Desktop Grids , 2008, 2008 12th IEEE/ACM International Symposium on Distributed Simulation and Real-Time Applications.

[11]  K. Mani Chandy,et al.  Distributed Simulation: A Case Study in Design and Verification of Distributed Programs , 1979, IEEE Transactions on Software Engineering.

[12]  Kyle A. Gallivan,et al.  The gSOAP Toolkit for Web Services and Peer-to-Peer Computing Networks , 2002, 2nd IEEE/ACM International Symposium on Cluster Computing and the Grid (CCGRID'02).

[13]  Ian T. Foster Globus Toolkit Version 4: Software for Service-Oriented Systems , 2005, NPC.

[14]  Kalyan S. Perumalla,et al.  /spl mu/sik - a micro-kernel for parallel/distributed simulation systems , 2005, Workshop on Principles of Advanced and Distributed Simulation (PADS'05).

[15]  S. W. Reichenthal Re-introducing Web-based simulation , 2002, Proceedings of the Winter Simulation Conference.

[16]  Miron Livny,et al.  Condor-a hunter of idle workstations , 1988, [1988] Proceedings. The 8th International Conference on Distributed.

[17]  Darrin West,et al.  Automatic incremental state saving , 1996, Workshop on Parallel and Distributed Simulation.

[18]  Edsger W. Dijkstra,et al.  Termination Detection for Diffusing Computations , 1980, Inf. Process. Lett..

[19]  Gage Js,et al.  The great Internet Mersenne prime search. , 1998 .

[20]  Kalyan S. Perumalla Discrete-event Execution Alternatives on General Purpose Graphical Processing Units (GPGPUs) , 2006, 20th Workshop on Principles of Advanced and Distributed Simulation (PADS'06).

[21]  Jeff S. Steinman Incremental state saving in SPEEDES using C++ , 1993, WSC '93.

[22]  Vijay K. Naik,et al.  Harmony: a desktop grid for delivering enterprise computations , 2003, Proceedings. First Latin American Web Congress.

[23]  Yingping Huang,et al.  A self manageable infrastructure for supporting Web-based simulations , 2004, 37th Annual Simulation Symposium, 2004. Proceedings..

[24]  Friedemann Mattern,et al.  Efficient Algorithms for Distributed Snapshots and Global Virtual Time Approximation , 1993, J. Parallel Distributed Comput..

[25]  Jeff Linderoth,et al.  Metacomputing and the Master-Worker Paradigm , 2000 .

[26]  Rajive L. Bagrodia,et al.  Simultaneous events and lookahead in simulation protocols , 2000, TOMC.

[27]  P. Dickens,et al.  SRADS WITH LOCAL ROLLBACK , 1990 .

[28]  Richard M. Fujimoto,et al.  GTW: a time warp system for shared memory multiprocessors , 1994, Proceedings of Winter Simulation Conference.

[29]  Madhusudhan Govindaraju,et al.  Investigating the limits of SOAP performance for scientific computing , 2002, Proceedings 11th IEEE International Symposium on High Performance Distributed Computing.

[30]  David P. Anderson,et al.  SETI@home: an experiment in public-resource computing , 2002, CACM.

[31]  K. Mani Chandy,et al.  Asynchronous distributed simulation via a sequence of parallel computations , 1981, CACM.

[32]  Vaidy S. Sunderam,et al.  PVM: A Framework for Parallel Distributed Computing , 1990, Concurr. Pract. Exp..

[33]  Jeff S. Steinman,et al.  SPEEDES - A multiple-synchronization environment for parallel discrete-event simulation , 1992 .

[34]  Andreas Tolk,et al.  Using Web Services to Integrate Heterogeneous Simulations in a Grid Environment , 2004, International Conference on Computational Science.

[35]  Richard M. Fujimoto,et al.  Time Warp on a Shared Memory Multiprocessor , 1989, ICPP.

[36]  Michael Lang,et al.  A Performance and Scalability Analysis of the BlueGene/L Architecture , 2004, Proceedings of the ACM/IEEE SC2004 Conference.

[37]  David M. Nicol,et al.  The cost of conservative synchronization in parallel discrete event simulations , 1993, JACM.

[38]  Richard M. Fujimoto,et al.  Adaptive memory management and optimism control in time warp , 1997, TOMC.

[39]  Peter Martini,et al.  A Flexible Dynamic Partitioning Algorithm for Optimistic Distributed Simulation , 2007, 21st International Workshop on Principles of Advanced and Distributed Simulation (PADS'07).

[40]  David P. Anderson,et al.  BOINC: a system for public-resource computing and storage , 2004, Fifth IEEE/ACM International Workshop on Grid Computing.

[41]  Richard M. Fujimoto,et al.  Aurora: An Approach to High Throughput Parallel Simulation , 2006, 20th Workshop on Principles of Advanced and Distributed Simulation (PADS'06).

[42]  Christopher D. Carothers,et al.  Efficient optimistic parallel simulations using reverse computation , 1999, Proceedings Thirteenth Workshop on Parallel and Distributed Simulation. PADS 99. (Cat. No.PR00155).

[43]  Amit P. Sheth,et al.  Web service technologies and their synergy with simulation , 2002, Proceedings of the Winter Simulation Conference.

[44]  Richard M. Fujimoto,et al.  Scalable RTI-based parallel simulation of networks , 2003, Seventeenth Workshop on Parallel and Distributed Simulation, 2003. (PADS 2003). Proceedings..

[45]  Yi-Bing Lin,et al.  Selecting the checkpoint interval in time warp simulation , 1993, PADS '93.

[46]  M. Liljenstam,et al.  Transparent Incremental State Saving in Time Warp Parallel Discrete Event Simulation , 1996, Proceedings of Symposium on Parallel and Distributed Tools.

[47]  Gilles Fedak,et al.  The Computational and Storage Potential of Volunteer Computing , 2006, Sixth IEEE International Symposium on Cluster Computing and the Grid (CCGRID'06).

[48]  Behrokh Samadi Distributed simulation, algorithms and performance analysis (load balancing, distributed processing) , 1985 .

[49]  Peter M. A. Sloot,et al.  Toward Grid-Aware Time Warp , 2005, Simul..

[50]  Richard M. Fujimoto,et al.  Scalable Simulation of Electromagnetic Hybrid Codes , 2006, International Conference on Computational Science.

[51]  Kalyan S. Perumalla,et al.  ON EVALUATION NEEDS OF REAL-LIFE SENSOR NETWORK DEPLOYMENTS , 2006 .

[52]  Stephen John Turner,et al.  A load management system for running HLA-based distributed simulations over the grid , 2002, Proceedings. Sixth IEEE International Workshop on Distributed Simulation and Real-Time Applications.

[53]  Hao Wu,et al.  Large-scale network simulation: how big? how fast? , 2003, 11th IEEE/ACM International Symposium on Modeling, Analysis and Simulation of Computer Telecommunications Systems, 2003. MASCOTS 2003..

[54]  Richard M. Fujimoto,et al.  Adaptive Flow Control in Time Warp , 1997, Workshop on Parallel and Distributed Simulation.

[55]  Naga K. Govindaraju,et al.  GPGPU: general-purpose computation on graphics hardware , 2006, SC.

[56]  Stephen John Turner,et al.  A Service Oriented HLA RTI on the Grid , 2007, IEEE International Conference on Web Services (ICWS 2007).

[57]  Alois Ferscha Probabilistic adaptive direct optimism control in Time Warp , 1995, PADS.

[58]  Andrew A. Chien,et al.  Resource Management for Rapid Application Turnaround on Enterprise Desktop Grids , 2004, Proceedings of the ACM/IEEE SC2004 Conference.

[59]  E. Deelman,et al.  Dynamic load balancing in parallel discrete event simulation for spatially explicit problems , 1998, Proceedings. Twelfth Workshop on Parallel and Distributed Simulation PADS '98 (Cat. No.98TB100233).

[60]  L. F. Wilson,et al.  Experiments in Automated Load Balancing , 1996, Proceedings of Symposium on Parallel and Distributed Tools.

[61]  Randal E. Bryant,et al.  SIMULATION OF PACKET COMMUNICATION ARCHITECTURE COMPUTER SYSTEMS , 1977 .

[62]  Yi-Bing Lin,et al.  Optimal memory management for time warp parallel simulation , 1991, TOMC.

[63]  Andrew A. Chien,et al.  Henri Casanova , 2022 .

[64]  Bu-Sung Lee,et al.  Unicorn: voluntary computing over Internet , 2002, OPSR.

[65]  David R. Jefferson,et al.  Virtual time , 1985, ICPP.

[66]  Jeff S. Steinman,et al.  Breathing Time Warp , 1993, PADS '93.

[67]  Robert A. van Engelen,et al.  Pushing the SOAP Envelope with Web Services for Scientific Computing , 2003, ICWS.

[68]  Richard M. Fujimoto,et al.  Parallel and Distribution Simulation Systems , 1999 .

[69]  Brian N. Bershad,et al.  Lightweight remote procedure call , 1989, TOCS.

[70]  Vijay S. Pande,et al.  Screen Savers of the World Unite! , 2000, Science.

[71]  Richard M. Fujimoto,et al.  IDSim: an extensible framework for Interoperable Distributed Simulation , 2004 .

[72]  Daniele Gianni,et al.  Using CORBA to Enhance HLA Interoperability in Distributed and Web-Based Simulation , 2004, ISCIS.

[73]  Yi-Bing Lin Parallel Independent Replicated Simulation on a Network of Workstations , 1995, Simul..

[74]  Joel H. Saltz,et al.  The utility of exploiting idle workstations for parallel computation , 1997, SIGMETRICS '97.

[75]  Ian T. Foster,et al.  Condor-G: A Computation Management Agent for Multi-Institutional Grids , 2004, Cluster Computing.

[76]  Péter Kacsuk,et al.  SZTAKI Desktop Grid: a Modular and Scalable Way of Building Large Computing Grids , 2007, 2007 IEEE International Parallel and Distributed Processing Symposium.

[77]  Adelinde M. Uhrmacher,et al.  Parallel and Distributed Spatial Simulation of Chemical Reactions , 2008, 2008 22nd Workshop on Principles of Advanced and Distributed Simulation.

[78]  Ian T. Foster,et al.  The Anatomy of the Grid: Enabling Scalable Virtual Organizations , 2001, Int. J. High Perform. Comput. Appl..

[79]  David Jefferson,et al.  Virtual time II: storage management in conservative and optimistic systems , 1990, PODC '90.

[80]  Takuji Nishimura,et al.  Mersenne twister: a 623-dimensionally equidistributed uniform pseudo-random number generator , 1998, TOMC.

[81]  B.R. Preiss,et al.  Memory management techniques for time warp on a distributed memory machine , 1995, Proceedings 9th Workshop on Parallel and Distributed Simulation (ACM/IEEE).

[82]  Gilles Fedak,et al.  XtremWeb: a generic global computing system , 2001, Proceedings First IEEE/ACM International Symposium on Cluster Computing and the Grid.

[83]  Andrew A. Chien,et al.  Entropia: architecture and performance of an enterprise desktop grid system , 2003, J. Parallel Distributed Comput..

[84]  Stephen John Turner,et al.  Service provisioning for HLA-based distributed simulation on the grid , 2005, Workshop on Principles of Advanced and Distributed Simulation (PADS'05).