Performance of N-body codes on hybrid machines

Abstract N -body codes are routinely used for simulation studies of physical systems, e.g. in the fields of computational astrophysics and molecular dynamics. Typically, they require only a moderate amount of run-time memory, but are very demanding in computational power. A detailed analysis of an N -body code performance, in terms of the relative weight of each task of the code, and how this weight is influenced by software or hardware optimisations, is essential in improving such codes. The approach of developing a dedicated device, GRAPE [J. Makino, M. Taiji, Scientific Simulations with Special Purpose Computers, Wiley, New York, 1998], able to provide a very high performance for the most expensive computational task of this code, has resulted in a dramatic performance leap. We explore on the performance of different versions of parallel N -body codes, where both software and hardware improvements are introduced. The use of GRAPE as a ‘force computation accelerator’ in a parallel computer architecture, can be seen as an example of a hybrid architecture, where special purpose device boards help a general purpose (multi)computer to reach a very high performance.