A Study of Update Methods for BoND-Tree Index on Non-ordered Discrete Vector Data

There is an increasing demand from numerous applications such as bioinformatics and cybersecurity to efficiently process various types of queries on datasets in a multidimensional Non-ordered Discrete Data Space (NDDS). An NDDS consists of vectors with values coming from a non-ordered discrete domain for each dimension. The BoND-tree index was recently developed to efficiently process box queries on a large dataset from an NDDS on disk. The original work of the BoND-tree focused on developing the index construction and query algorithms. No work has been reported on exploring efficient and effective update strategies for the BoND-tree. In this paper, we study two update methods based on two different strategies for updating the index tree in an NDDS. Our study shows that using the bottom-up update method can provide improved efficiency, comparing to the traditional top-down update method, especially when the number of dimensions for a vector that need to be updated is small. On the other hand, our study also shows that the two update methods have a comparable effectiveness, which indicates that the bottom-up update method is generally more advantageous.

[1]  Ada Wai-Chee Fu,et al.  Dynamic vp-tree indexing for n-nearest neighbor search given pair-wise distances , 2000, The VLDB Journal.

[2]  Walid G. Aref,et al.  The RUM-tree: supporting frequent updates in R-trees using memos , 2009, The VLDB Journal.

[3]  Changqing Chen,et al.  The BoND-Tree: An Efficient Indexing Method for Box Queries in Nonordered Discrete Data Spaces , 2013, IEEE Transactions on Knowledge and Data Engineering.

[4]  Antonin Guttman,et al.  R-trees: a dynamic index structure for spatial searching , 1984, SIGMOD '84.

[5]  Wen-Chi Hou,et al.  Deletion Techniques for the ND-tree in Non-ordered Discrete Data Spaces , 2009, SEDE.

[6]  Sakti Pramanik,et al.  k-Nearest neighbor searching in hybrid spaces , 2014, Inf. Syst..

[7]  Ricardo A. Baeza-Yates,et al.  Searching in metric spaces , 2001, CSUR.

[8]  Sakti Pramanik,et al.  Using disk based index and box queries for genome sequencing error correction , 2016, BICoB 2016.

[9]  Roberto Grossi,et al.  A Note on Updating Suffix Tree Labels , 1997, CIAC.

[10]  Hans-Peter Kriegel,et al.  The R*-tree: an efficient and robust access method for points and rectangles , 1990, SIGMOD '90.

[11]  Philippe Flajolet,et al.  Dynamical Sources in Information Theory : A General Analysis of Trie Structures , 1999 .

[12]  Masahiro Ishikawa,et al.  MB+Tree: A Dynamically Updatable Metric Index for Similarity Searches , 2000, Web-Age Information Management.

[13]  Peter Weiner,et al.  Linear Pattern Matching Algorithms , 1973, SWAT.

[14]  Mong-Li Lee,et al.  Supporting Frequent Updates in R-Trees: A Bottom-Up Approach , 2003, VLDB.

[15]  Sakti Pramanik,et al.  A space-partitioning-based indexing method for multidimensional non-ordered discrete data spaces , 2006, TOIS.

[16]  Sakti Pramanik,et al.  The BINDS-Tree: A Space-Partitioning Based Indexing Scheme for Box Queries in Non-Ordered Discrete Data Spaces , 2019, IEICE Trans. Inf. Syst..

[17]  Hanan Samet,et al.  Index-driven similarity search in metric spaces (Survey Article) , 2003, TODS.

[18]  James R. Cole,et al.  Back translated peptide K-mer search and local alignment in large DNA sequence databases using BoND-SD-tree indexing , 2015, 2015 IEEE 15th International Conference on Bioinformatics and Bioengineering (BIBE).

[19]  Sakti Pramanik,et al.  Dynamic indexing for multidimensional non-ordered discrete data spaces using a data-partitioning approach , 2006, TODS.

[20]  Changqing Chen,et al.  The C-ND tree: a multidimensional index for hybrid continuous and non-ordered discrete data spaces , 2009, EDBT '09.

[21]  Christian S. Jensen,et al.  Main-Memory Operation Buffering for Efficient R-Tree Update , 2007, VLDB.

[22]  Peter N. Yianilos,et al.  Data structures and algorithms for nearest neighbor search in general metric spaces , 1993, SODA '93.

[23]  Pavel Zezula,et al.  M-tree: An Efficient Access Method for Similarity Search in Metric Spaces , 1997, VLDB.

[24]  Sakti Pramanik,et al.  Exploring Deletion Strategies for the BoND-Tree in Multidimensional Non-ordered Discrete Data Spaces , 2017, IDEAS.

[25]  Sakti Pramanik,et al.  Efficient k-nearest neighbor searching in nonordered discrete data spaces , 2010, TOIS.