The PH-tree: a space-efficient storage structure and multi-dimensional index

We propose the PATRICIA-hypercube-tree, or PH-tree, a multi-dimensional data storage and indexing structure. It is based on binary PATRICIA-tries combined with hypercubes for efficient data access. Space efficiency is achieved by combining prefix sharing with a space optimised implementation. This leads to storage space requirements that are comparable or below storage of the same data in non-index structures such as arrays of objects. The storage structure also serves as a multi-dimensional index on all dimensions of the stored data. This enables efficient access to stored data via point and range queries. We explain the concept of the PH-tree and demonstrate the performance of a sample implementation on various datasets and compare it to other spatial indices such as the kD-tree. The experiments show that for larger datasets beyond 10^7 entries, the PH-tree increasingly and consistently outperforms other structures in terms of space efficiency, query performance and update performance.

[1]  Christos Faloutsos,et al.  Hilbert R-tree: An Improved R-tree using Fractals , 1994, VLDB.

[2]  Donald R. Morrison,et al.  PATRICIA—Practical Algorithm To Retrieve Information Coded in Alphanumeric , 1968, J. ACM.

[3]  Patrick E. O'Neil,et al.  The log-structured merge-tree (LSM-tree) , 1996, Acta Informatica.

[4]  Jon Louis Bentley,et al.  Quad trees a data structure for retrieval on composite keys , 1974, Acta Informatica.

[5]  Helmut Prodinger,et al.  Multidimensional Digital Searching and Some New Parameters in Tries , 1993, Int. J. Found. Comput. Sci..

[6]  Hans-Peter Kriegel,et al.  The X-tree : An Index Structure for High-Dimensional Data , 2001, VLDB.

[7]  Yan Cui,et al.  An efficient query indexing mechanism for filtering geo-textual data , 2014 .

[8]  David B. Lomet,et al.  The hB-tree: a multiattribute indexing method with good guaranteed performance , 1990, TODS.

[9]  Mark de Berg,et al.  The Priority R-tree: a practically efficient and worst-case optimal R-tree , 2004, SIGMOD '04.

[10]  Glen Hansen,et al.  USE OF THE SPATIAL KD-TREE IN COMPUTATIONAL PHYSICSAPPLICATIONS , 2007 .

[11]  Bradford G. Nickerson,et al.  On k-d Range Search with Patricia Tries , 2008, SIAM J. Comput..

[12]  Jeffrey Scott Vitter,et al.  Bkd-Tree: A Dznamic Scalable kd-Tree , 2003, SSTD.

[13]  Michael Freeston A general solution of the n-dimensional B-tree problem , 1995, SIGMOD '95.

[14]  Jon Louis Bentley,et al.  Multidimensional binary search trees used for associative searching , 1975, CACM.

[15]  Ulrich Germann,et al.  Tightly Packed Tries: How to Fit Large Models into Memory, and Make them Load Fast, Too , 2009 .

[16]  J. T. Robinson,et al.  The K-D-B-tree: a search structure for large multidimensional dynamic indexes , 1981, SIGMOD '81.

[17]  Oliver Günther,et al.  Multidimensional access methods , 1998, CSUR.

[18]  Krassimira Ivanova,et al.  ADVANCE OF THE ACCESS METHODS , 2008 .

[19]  Marios Hadjieleftheriou,et al.  R-Trees - A Dynamic Index Structure for Spatial Searching , 2008, ACM SIGSPATIAL International Workshop on Advances in Geographic Information Systems.

[20]  Jo-Mei Chang,et al.  Extended K-d Tree Database Organization: A Dynamic Multiattribute Clustering Method , 1981, IEEE Transactions on Software Engineering.