Multidimensional Index for Highly Clustered Data with Large Density Contrasts

The SDSS [Sloan Digital Sky Survey] archive will contain multicolor data for over 100 million galaxies, with a volume of close to a Terabyte. Efficient searches in multicolor space will require novel indexing. Techniques such as the k-d tree are applicable, but less than optimal, since the most of the data is highly clustered, but a small fraction is randomly distributed, causing cells to be very inbalanced. We propose to resolve this by implementing an algorithm which splits the population into two high/low density parts.