Exploiting Sample-Data Distributions to Reduce the Cost of Nearest-Neighbor Searches with Kd-Trees

We present KD-DT, an algorithm that uses a decision-tree-inspired measure to build a kd-tree for low cost nearest-neighbor searches. The algorithm starts with a "standard" kd-tree and uses searches over a training set to evaluate and improve the structure of the kd-tree. In particular, the algorithm builds a tree that better insures that a query and its nearest neighbors will be in the same subtree(s), thus reducing the cost of subsequent search.