A Modified Parallel Tree Code for N-Body Simulation of the Large-Scale Structure of the Universe

N-body codes for performing simulations of the origin and evolution of the large-scale structure of the universe have improved significantly over the past decade in terms of both the resolution achieved and the reduction of the CPU time. However, state-of-the-art N-body codes hardly allow one to deal with particle numbers larger than a few 107, even on the largest parallel systems. In order to allow simulations with larger resolution, we have first reconsidered the grouping strategy as described in J. Barnes (1990, J. Comput. Phys.87, 161) (hereafter B90) and applied it with some modifications to our WDSH?PT (Work and Data SHaring?Parallel Tree) code (U. Becciani et al., 1996, Comput. Phys. Comm.99, 1). In the first part of this paper we will give a short description of the code adopting the algorithm of J. E. Barnes and P. Hut (1986, Nature324, 446) and in particular the memory and work distribution strategy applied to describe the data distribution on a CC?NUMA machine like the CRAY?T3E system. In very large simulations (typically N?107), due to network contention and the formation of clusters of galaxies, an uneven load easily verifies. To remedy this, we have devised an automatic work redistribution mechanism which provided a good dynamic load balance without adding significant overhead. In the second part of the paper we describe the modification to the Barnes grouping strategy we have devised to improve the performance of the WDSH?PT code. We will use the property that nearby particles have similar interaction lists. This idea has been checked in B90, where an interaction list is built which applies everywhere within a cell Cgroup containing a small number of particles Ncrit. B90 reuses this interaction list for each particle p?Cgroup in the cell in turn. We will assume each particle p to have the same interaction list. We consider that the agent force Fp on a particle p can be decomposed into two terms Fp=Ffar+Fnear. The first term Ffar is the same for each particle in the cell and is generated by the interaction between a hypothetical particle placed in the center of mass of the Cgroup and the farther cells contained in the interaction list. Fnear is different for each particle p and is generated by the interaction between p and the elements near Cgroup. Thus it has been possible to reduce the CPU time and increase the code performance. This enables us to run simulations with a large number of particles (N~107?109) in nonprohibitive CPU times.