GLORY-DB: A Distributed Data Management System for Large Scale High-Dimensional Data

Recently, the proliferation of the web and digital photography has resulted in the need of a distributed storage system for managing large scale data and an indexing technique for supporting efficient nearest neighbor search on high-dimensional data. One of the most challenging areas in the fields of a distributed data managing and image processing is scalability of data and machines. Especially, for a large scale image clustering problem, which can not fit on a single machine, the traditional nearest neighbor search can not be applied. This paper presents the design of a distributed data management system, highly available and scalable storage system which provides contents-based retrieval using a hybrid spill tree with local signature files. We describe our scalable index structure and how it can be used to find the nearest neighbors in the cluster environments.