Barnes-hut treecode on GPU

General-purpose computation on graphics processing units (GPGPU) has become a popular field of study. Due to its high computing capacity and relatively low price, GPU has been an ideal processing unit for many scientific applications, among which is N-body simulation. According to the published papers, a simple O(N^2) algorithm of N-body simulation has achieved some enhancements, but tree-algorithm doesn't work well on GPU. This paper proposes a new implementation of tree-algorithm on GPU using CUDA, which has obtained more than 100X speedup when computing forces between bodies. This paper also rises up a new method to build tree in this algorithm, making the performance even better.