Exploring Deletion Strategies for the BoND-Tree in Multidimensional Non-ordered Discrete Data Spaces

Box queries on a dataset in a multidimensional data space are a type of query which specifies a set of allowed values for each dimension. Indexing a dataset in a multidimensional Non-ordered Discrete Data Space (NDDS) for supporting efficient box queries is becoming increasingly important in many application domains such as genome sequence analysis. The BoND-tree was recently introduced as an index structure specifically designed for box queries in an NDDS. Earlier work focused on developing strategies for building an effective BoND-tree to achieve high query performance. Developing efficient and effective techniques for deleting indexed vectors from the BoND-tree remains an open issue. In this paper, we present three deletion algorithms based on different underflow handling strategies in an NDDS. Our study shows that incorporating a new BoND-tree inspired heuristic can provide improved performance compared to the traditional underflow handling heuristics in NDDSs.

[1]  Z. Meral Özsoyoglu,et al.  Indexing large metric spaces for similarity search queries , 1999, TODS.

[2]  R. Maelbrancke,et al.  Optimizing Jan Jannink's Implementation of B+-tree deletion , 1995, SGMD.

[3]  Sakti Pramanik,et al.  Efficient k-nearest neighbor searching in nonordered discrete data spaces , 2010, TOIS.

[4]  Philippe Flajolet,et al.  Dynamical Sources in Information Theory : A General Analysis of Trie Structures , 1999 .

[5]  Hanan Samet,et al.  Index-driven similarity search in metric spaces (Survey Article) , 2003, TODS.

[6]  Peter N. Yianilos,et al.  Data structures and algorithms for nearest neighbor search in general metric spaces , 1993, SODA '93.

[7]  Robert Sedgewick,et al.  Fast algorithms for sorting and searching strings , 1997, SODA '97.

[8]  Sakti Pramanik,et al.  A space-partitioning-based indexing method for multidimensional non-ordered discrete data spaces , 2006, TOIS.

[9]  Antonin Guttman,et al.  R-trees: a dynamic index structure for spatial searching , 1984, SIGMOD '84.

[10]  Hans-Peter Kriegel,et al.  The X-tree : An Index Structure for High-Dimensional Data , 2001, VLDB.

[11]  Sakti Pramanik,et al.  Bulk-Loading the ND-Tree in Non-ordered Discrete Data Spaces , 2008, DASFAA.

[12]  Ramesh C. Jain,et al.  Similarity indexing with the SS-tree , 1996, Proceedings of the Twelfth International Conference on Data Engineering.

[13]  Andreas Henrich,et al.  The LSD/sup h/-tree: an access structure for feature vectors , 1998, Proceedings 14th International Conference on Data Engineering.

[14]  Sakti Pramanik,et al.  Using disk based index and box queries for genome sequencing error correction , 2016, BICoB 2016.

[15]  Sakti Pramanik,et al.  The ND-Tree: A Dynamic Indexing Technique for Multidimensional Non-ordered Discrete Data Spaces , 2003, VLDB.

[16]  Hans-Peter Kriegel,et al.  The R*-tree: an efficient and robust access method for points and rectangles , 1990, SIGMOD '90.

[17]  J. T. Robinson,et al.  The K-D-B-tree: a search structure for large multidimensional dynamic indexes , 1981, SIGMOD '81.

[18]  Sakti Pramanik,et al.  Dynamic indexing for multidimensional non-ordered discrete data spaces using a data-partitioning approach , 2006, TODS.

[19]  James R. Cole,et al.  Back translated peptide K-mer search and local alignment in large DNA sequence databases using BoND-SD-tree indexing , 2015, 2015 IEEE 15th International Conference on Bioinformatics and Bioengineering (BIBE).

[20]  Hanan Samet,et al.  Deletion in two-dimensional quad trees , 1980, CACM.

[21]  Sakti Pramanik,et al.  On k-Nearest Neighbor Searching in Non-Ordered Discrete Data Spaces , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[22]  Peter Weiner,et al.  Linear Pattern Matching Algorithms , 1973, SWAT.

[23]  Jan Jannink,et al.  Implementing deletion in B+-trees , 1995, SGMD.

[24]  SametHanan,et al.  Index-driven similarity search in metric spaces (Survey Article) , 2003 .

[25]  Yannis Manolopoulos,et al.  Performance Evaluation of Lazy Deletion Methods in R-trees , 2003, GeoInformatica.

[26]  Wen-Chi Hou,et al.  Deletion Techniques for the ND-tree in Non-ordered Discrete Data Spaces , 2009, SEDE.

[27]  Pavel Zezula,et al.  M-tree: An Efficient Access Method for Similarity Search in Metric Spaces , 1997, VLDB.

[28]  Gonzalo Navarro,et al.  Improved deletions in dynamic spatial approximation trees , 2003, 23rd International Conference of the Chilean Computer Science Society, 2003. SCCC 2003. Proceedings..

[29]  Ricardo A. Baeza-Yates,et al.  Searching in metric spaces , 2001, CSUR.

[30]  Changqing Chen,et al.  The BoND-Tree: An Efficient Indexing Method for Box Queries in Nonordered Discrete Data Spaces , 2013, IEEE Transactions on Knowledge and Data Engineering.