Efficiency of hybrid index structures - Theoretical analysis and a practical application

Hybrid index structures support access to heterogeneous data types in multiple columns. Several experiments confirm the improved efficiency of these hybrid access structures. Yet, very little is known about the worst case time and space complexity of them. This paper aims to close this gap by introducing a theoretical framework supporting the analysis of hybrid index structures. This framework then is used to derive the constraints for an access structure which is both time and space efficient. An access structure based on a B+-Tree augmented with bit lists representing sets of terms from texts is the outcome of the analysis which is then validated experimentally together with a hybrid R-Tree variant to show a logarithmic search time complexity. HighlightsWe define a theory to evaluate indices combining different data types.We design and validate a hybrid index based on B+-Trees, a R-Tree and bitlists.The hybrid index focusses on efficiently combining single- and multi-valued data.The hybrid access structure can achieve a logarithmic time complexity.Furthermore the new indexing mechanism guarantees a linear space complexity.

[1]  Richard Göbel,et al.  Towards Hybrid Index Structures for Multi-Media Search Criteria , 2010, DMS.

[2]  Anthony K. H. Tung,et al.  Keyword Search in Spatial Databases: Towards Searching by Document , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[3]  Richard Göbel,et al.  Towards Logarithmic Search Time Complexity for R-Trees , 2007 .

[4]  Christos Faloutsos,et al.  Hybrid Index Organizations for Text Databases , 1992, EDBT.

[5]  H. S. Heaps,et al.  Information retrieval, computational and theoretical aspects , 1978 .

[6]  Yuen Ren Chao,et al.  Human Behavior and the Principle of Least Effort: An Introduction to Human Ecology , 1950 .

[7]  R. Bayer,et al.  Organization and maintenance of large ordered indices , 1970, SIGFIDET '70.

[8]  Naphtali Rishe,et al.  Keyword Search on Spatial Databases , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[9]  João B. Rocha-Junior,et al.  Efficient Processing of Top-k Spatial Keyword Queries , 2011, SSTD.

[10]  Rudolf Bayer,et al.  Organization and maintenance of large ordered indexes , 1972, Acta Informatica.

[11]  Hans-Peter Kriegel,et al.  The R*-tree: an efficient and robust access method for points and rectangles , 1990, SIGMOD '90.

[12]  Chen Li,et al.  Processing Spatial-Keyword (SK) Queries in Geographic Information Retrieval (GIR) Systems , 2007, 19th International Conference on Scientific and Statistical Database Management (SSDBM 2007).

[13]  George Kingsley Zipf,et al.  Human Behaviour and the Principle of Least Effort: an Introduction to Human Ecology , 2012 .

[14]  Antonin Guttman,et al.  R-trees: a dynamic index structure for spatial searching , 1984, SIGMOD '84.

[15]  Richard Göbel,et al.  A hybrid index structure for geo-textual searches , 2009, CIKM.

[16]  Daniel P. Miranker,et al.  On a model of indexability and its bounds for range queries , 2002, JACM.

[17]  Changqing Chen,et al.  The C-ND tree: a multidimensional index for hybrid continuous and non-ordered discrete data spaces , 2009, EDBT '09.