An Empirical Comparison of Runtime Systems for Conservative Parallel Simulation

A main consideration when implementing a parallel simulation application is the choice of the parallel simulation protocol (conservative vs. optimistic). Given a particular protocol, the application programmer then has to determine a suitable parallel runtime system to implement the application. If the choice is an optimistic protocol, there are several parallel simulation libraries intended for application programmers (e.g. GTW, Warped). For a conservative protocol, the most effective approach is for the programmer to use a general parallel runtime library, and implement optimizations specific to the simulation application and/or model. In this paper, we selected four general parallel runtime libraries potentially relevant to parallel simulations, and implemented a conservative protocol on each of them. We study the four libraries on three main aspects: (a) programmability; (b) performance, and (c) mechanisms for performance tuning. Our target platforms are machines supporting shared address spaces (e.g. SGI Origin200, Sun Enterprise 3000), and we obtained performance figures from a 4-CPU Ultra2 Sun Enterprise 3000. From our experiments, we find that POSIX, though an industry standard, still has relatively high overheads, and cannot efficiently support a protocol with fine-grain LPs. The research libraries all show speedups on 4 processors, but to different extents. Cilk speedup curves improves with larger thread granularity, while Active threads show relatively good speedup even for small thread granularity. BSP processes are naturally coarse-grained, and thus good speedup is achieved in our simulation application.