论文信息 - Restructuring a parallel simulation to improve cache behavior in a shared-memory multiprocessor: the value of distributed synchronization

Restructuring a parallel simulation to improve cache behavior in a shared-memory multiprocessor: the value of distributed synchronization

Synchronization is a significant cost in many parallel programs, and can be a major bottleneck if it is handled in a centralized fashion using traditional shared-memory constructs such as barriers. In a parallel time-stepped simulation, the use of global synchronization primitives limits scalability, increases the sensitivity to load imbalance, and reduces the potential for exploiting locality to improve cache behavior. This paper presents the results of an initial one-application study quantifying the costs and performance benefits of distributed, nearest neighbors synchronization. The application studied, MP3D, is a particle-based wind tunnel simulation. Our results for this one application on current shared-memory multiprocessors show a significant decrease in synchronization time using these techniques. We prototyped an application-independent library that implements distributed synchronization. The library allows a variety of parallel simulations to exploit these techniques without increasing the application programming beyond that of conventional approaches.

Philip Machanick | Hendrik A. Goosen | David R. Cheriton | Hugh Holbrook

[1] R. M. Fujimoto,et al. Parallel discrete event simulation , 1989, WSC '89.

[2] Jonathan Rose. LocusRoute: a parallel global router for standard cells , 1988, 25th ACM/IEEE, Design Automation Conference.Proceedings 1988..

[3] K. Mani Chandy,et al. Asynchronous distributed simulation via a sequence of parallel computations , 1981, CACM.

[4] James H. Patterson,et al. Portable Programs for Parallel Processors , 1987 .

[5] Richard M. Fujimoto,et al. Parallel discrete event simulation , 1990, CACM.

[6] J. Mcdonald,et al. Vectorization of a particle simulation method for hypersonic rarefied flow , 1988 .

[7] Anoop Gupta,et al. The Stanford Dash multiprocessor , 1992, Computer.

[8] Anna R. Karlin,et al. Empirical studies of competitve spinning for a shared-memory multiprocessor , 1991, SOSP '91.

[9] Hendrik A. Goosen,et al. Paradigm: a highly scalable shared-memory multicomputer architecture , 1991, Computer.