Efficient Parallel Discrete Event Simulation on Cloud/Virtual Machine Platforms

Cloud and Virtual Machine (VM) technologies present new challenges with respect to performance and monetary cost in executing parallel discrete event simulation (PDES) applications. Due to the introduction of overall cost as a metric, the traditional use of the highest-end computing configuration is no longer the most obvious choice. Moreover, the unique runtime dynamics and configuration choices of Cloud and VM platforms introduce new design considerations and runtime characteristics specific to PDES over Cloud/VMs. Here, an empirical study is presented to help understand the dynamics, trends, and trade-offs in executing PDES on Cloud/VM platforms. Performance and cost measures obtained from multiple PDES applications executed on the Amazon EC2 Cloud and on a high-end VM host machine reveal new, counterintuitive VM--PDES dynamics and guidelines. One of the critical aspects uncovered is the fundamental mismatch in hypervisor scheduler policies designed for general Cloud workloads versus the virtual time ordering needed for PDES workloads. This insight is supported by experimental data revealing the gross deterioration in PDES performance traceable to VM scheduling policy. To overcome this fundamental problem, the design and implementation of a new deadlock-free scheduler algorithm are presented, optimized specifically for PDES applications on VMs. The scalability of our scheduler has been tested in up to 128 VMs multiplexed on 32 cores, showing significant improvement in the runtime relative to the default Cloud/VM scheduler. The observations, algorithmic design, and results are timely for emerging Cloud/VM-based installations, highlighting the need for PDES-specific support in high-performance discrete event simulations on Cloud/VM platforms.

[1]  S PerumallaKalyan,et al.  Efficient Parallel Discrete Event Simulation on Cloud/Virtual Machine Platforms , 2015 .

[2]  John Shalf,et al.  Performance Analysis of High Performance Computing Applications on the Amazon Web Services Cloud , 2010, 2010 IEEE Second International Conference on Cloud Computing Technology and Science.

[3]  Jan Broeckhove,et al.  Conservative Distributed Discrete Event Simulation on Amazon EC2 , 2012, 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012).

[4]  Sudip K. Seal,et al.  Discrete event modeling and massively parallel execution of epidemic outbreak phenomena , 2012, Simul..

[5]  Kalyan S. Perumalla,et al.  /spl mu/sik - a micro-kernel for parallel/distributed simulation systems , 2005, Workshop on Principles of Advanced and Distributed Simulation (PADS'05).

[6]  Gabriele D'Angelo,et al.  Parallel and Distributed Simulation from Many Cores to the Public Cloud (Extended Version) , 2011, ArXiv.

[7]  T. S. Eugene Ng,et al.  The Impact of Virtualization on Network Performance of Amazon EC2 Data Center , 2010, 2010 Proceedings IEEE INFOCOM.

[8]  David Chisnall,et al.  The Definitive Guide to the Xen Hypervisor (Prentice Hall Open Source Software Development Series) , 2007 .

[9]  Asad Waqar Malik,et al.  An Optimistic Parallel Simulation Protocol for Cloud Computing Environments , 2010 .

[10]  Gabriele D'Angelo,et al.  Parallel and distributed simulation from many cores to the public cloud , 2011, 2011 International Conference on High Performance Computing & Simulation.

[11]  Alfred Park,et al.  Master/worker parallel discrete event simulation , 2008 .

[12]  David Chisnall,et al.  The Definitive Guide to the Xen Hypervisor , 2007 .

[13]  Arie Shoshani,et al.  System Deadlocks , 1971, CSUR.

[14]  Srikanth B. Yoginath,et al.  Empirical evaluation of conservative and optimistic discrete event execution on cloud and VM platforms , 2013, SIGSIM PADS '13.

[15]  Srikanth B. Yoginath,et al.  Reversible discrete event formulation and optimistic parallel execution of vehicular traffic models , 2009, Int. J. Simul. Process. Model..

[16]  P. Mell,et al.  The NIST Definition of Cloud Computing , 2011 .

[17]  Srikanth B. Yoginath,et al.  Optimized hypervisor scheduler for parallel discrete event simulations on virtual machine platforms , 2013, SimuTools.

[18]  Srikanth B. Yoginath,et al.  Efficiently Scheduling Multi-Core Guest Virtual Machines on Multi-Core Hosts in Network Simulation , 2011, 2011 IEEE Workshop on Principles of Advanced and Distributed Simulation.

[19]  Srikanth B. Yoginath,et al.  Parallel Vehicular Traffic Simulation using Reverse Computation-based Optimistic Execution , 2008, 2008 22nd Workshop on Principles of Advanced and Distributed Simulation.

[20]  David M. Nicol,et al.  A Virtual Time System for OpenVZ-Based Network Emulations , 2011, 2011 IEEE Workshop on Principles of Advanced and Distributed Simulation.

[21]  P. Mell,et al.  SP 800-145. The NIST Definition of Cloud Computing , 2011 .

[22]  Brian Beckman,et al.  Time warp operating system , 1987, SOSP '87.

[23]  Brian J. Henz,et al.  Taming Wild Horses: The Need for Virtual Time-Based Scheduling of VMs in Network Simulations , 2012, 2012 IEEE 20th International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems.

[24]  Asad Waqar Malik,et al.  Parallel and Distributed Simulation in the Cloud , 2010 .

[25]  Asad Malik,et al.  Optimistic Synchronization of Parallel Simulations in Cloud Computing Environments , 2009, 2009 IEEE International Conference on Cloud Computing.

[26]  Eli M. Dow,et al.  Running Xen: A Hands-On Guide to the Art of Virtualization , 2008 .