Design and Implementation of Cover Tree Algorithm on CUDA-Compatible GPU

Recently developed architecture such as Compute Unified Device Architecture (CUDA) allows us to exploit the computational power of Graphics Processing Units (GPU). In this paper we propose an algorithm for implementation of Cover tree, accelerated on the graphics processing unit (GPU). The existing algorithm for Cover Tree implementation is for single core CPU and is not suitable for applications with large data set such as phylogenetic analysis in bioinformatics, in order to find nearest neighbours in real time. As far as we know this is first attempt made ever to implement the cover tree on GPU. The proposed algorithm has been implemented using compute unified device architecture (CUDA), which is available on the NVIDIA GPU. The proposed algorithm efficiently uses on chip shared memory in order to reduce the data amount being transferred between offchip memory and processing elements in the GPU. Furthermore our algorithm presents a model to implement other distance trees on the GPU. We show some experimental results comparing the proposed algorithm with it's execution on pre-existing single core architecture. The results show that the proposed algorithm has a significant speedup as compare to the single core execution of this code.