New high-dimensional access methods based on superimposed space-partitioning schemes

Contemporary scientific and engineering applications increasingly rely on advanced database technology to store their data and search them on multi-attribute ranges. With the rapid increase in the volume and dimensionality of data, new multi-dimensional access methods are required to support these applications. This dissertation presents the results of a systematic research conducted to identify and pursue key factors of good retrieval performance in high-dimensional spaces. In the dissertation, the desirable properties of high-dimensional access methods are formulated as two design principles. The research shows that these properties depend on the space partitioning strategy underlying the access method, and that the existing partitioning schemes do not exhibit these properties. Based on this, a new partitioning strategy for high-dimensional spaces, called Γ, is proposed. The static Γ partitioning is used to impose desirable behavior on existing partitioning strategies. The idea of superimposed partitioning leads to a family of new access methods organized around the two design principles. The significant performance improvements over several popular access methods observed in the course of thorough experimental studies demonstrate the viability of the proposed access methods and the validity of the underlying approach. The emergence of advanced data-intensive analytical computing has also created the demand for efficient indexing of partially specified (incomplete) multi-dimensional data. The dissertation introduces new techniques for indexing incomplete data, which exploit key aspects of superimposed space partitioning. Effectively challenging conventional wisdom, this investigation has shown that multi-dimensional retrieval can be efficient, scalable, and effective even on incomplete data.