Efficiently Indexing High-Dimensional Data Spaces

Indexing high-dimensional data spaces is an emerging research domain. It gains increasing importance by the need to support modern applications by powerful search tools. In the so-called non-standard applications of database systems such as multimedia, CAD, molecular biology, medical imaging, time series processing and many others, similarity search in large data sets is required as a basic functionality. A technique widely applied for similarity search is the so-called feature transformation, where important properties of the database objects are transformed into points of a multidimensional vector space, the so-called feature vectors. Thus, similarity queries are naturally translated into neighborhood queries in the feature space. In order to achieve a high performance in query processing, multidimensional index structures are used to manage the feature vectors. Unfortunately, multidimensional index structures deteriorate in performance when the dimension of the data space increases, because they are primarily designed for low-dimensional data spaces and due to a bunch of effects usually called the ‘ curse of dimensionality’. The general goal of this thesis is therefore the improvement of the efficiency of indexbased query processing in high-dimensional data spaces. For this purpose, a cost model for index-based query processing in high-dimensional data spaces was developed. It is applicable to a variety of index structures and query processing techniques and can be used for the evaluation of techniques and for optimization. Based on this cost model, a variety of improvement and optimization techniques for multidimensional index structures was developed. The first, called DABS-tree, involves a cost model based split algorithm supporting a dynamic and local adaptation of the block size of the index structure. Dynamic block size adaptation is especially useful as we can show that conventional index structures often access data in too small portions.

[1]  Audra E. Kosh,et al.  Linear Algebra and its Applications , 1992 .

[2]  Hans-Peter Kriegel,et al.  Efficient User-Adaptable Similarity Search in Large Multimedia Databases , 1997, VLDB.

[3]  Hugh M. Sierra,et al.  An Introduction to Direct Access Storage Devices , 1990 .

[4]  Timos K. Sellis,et al.  A model for the prediction of R-tree performance , 1996, PODS.

[5]  Hans-Jörg Schek,et al.  A Quantitative Analysis and Performance Study for Similarity-Search Methods in High-Dimensional Spaces , 1998, VLDB.

[6]  Shin'ichi Satoh,et al.  The SR-tree: an index structure for high-dimensional nearest neighbor queries , 1997, SIGMOD '97.

[7]  Rajiv Mehrotra,et al.  Feature-Index-Based Similar Shape Retrieval , 1997, VDB.

[8]  Aris M. Ouksel The interpolation-based grid file , 1985, PODS '85.

[9]  James K. Mullin Retrieval—Update speed tradeoffs using combined indices , 1971, CACM.

[10]  David B. Lomet,et al.  The hB-tree: a multiattribute indexing method with good guaranteed performance , 1990, TODS.

[11]  Kuldip K. Paliwal,et al.  Fast K-dimensional tree algorithms for nearest neighbor search with application to vector quantization encoding , 1992, IEEE Trans. Signal Process..

[12]  Bernd-Uwe Pagel,et al.  Towards an analysis of range query performance in spatial data structures , 1993, PODS '93.

[13]  David B. Cooper,et al.  Recognition and positioning of rigid objects using algebraic moment invariants , 1991, Optics & Photonics.

[14]  T. H. Merrett,et al.  A class of data structures for associative searching , 1984, PODS.

[15]  Karen Kukich,et al.  Techniques for automatically correcting words in text , 1992, CSUR.

[16]  Jack A. Orenstein A comparison of spatial query processing techniques for native and parameter spaces , 1990, SIGMOD '90.

[17]  Ekow J. Otoo,et al.  A Mapping Function for the Directory of a Multidimensional Extendible Hashing , 1984, VLDB.

[18]  Stl Systemtechnik Ludwig,et al.  Bundesministerium für Bildung, Wissenschaft, Forschung und Technologie , 2000 .

[19]  P. Wintz,et al.  An efficient three-dimensional aircraft recognition algorithm using normalized fourier descriptors , 1980 .

[20]  Vincent Y. Lum,et al.  Multi-attribute retrieval with combined indexes , 1970, Commun. ACM.

[21]  Hans-Peter Kriegel,et al.  The Buddy-Tree: An Efficient and Robust Access Method for Spatial Data Base Systems , 1990, VLDB.

[22]  Christos Faloutsos,et al.  The R+-Tree: A Dynamic Index for Multi-Dimensional Objects , 1987, VLDB.