Inflated speedups in parallel simulations via malloc()

Abstract : Discrete-event simulation programs make heavy use of dynamic memory allocation in order to support simulation's very dynamic space requirements. When programming in C one is likely to use the malloc() routine. However, a parallel simulation which uses the standard Unix System V malloc() implementation may achieve an overly optimistic speedup, possibly superlinear. An alternate implementation provided on some (but not all) systems can avoid the speedup anomaly, but at the price of significantly reduced available free space. This is specially severe on most parallel architectures, which tend not to support virtual memory. This paper illustrates the problem, then shows how a simply implemented user-constructed interface to malloc() can both avoid artificially inflated speedups, and make efficient use of the dynamic memory space. The interface simply caches blocks on the basis of their size. We demonstrate the problem empirically, and show the effectiveness of our solution both empirically and analytically.