Bitmap Filtering: An Efficient Speedup Method for XML Structural Matching

With the proliferation of XML data on the internet, there is a large demand for efficient techniques in XML structural matching. We propose a novel filtering method, which is based on two auxiliary bitmaps named suffix bitmap and prefix bitmap, to accelerate XML structural matching. For each node in the XML document, the suffix bitmap captures in a compact format the tag name list of suffix subtree, and the prefix bitmap captures the tag name list of prefix path respectively. During the structural matching, most of unmatched node candidates can be filtered efficiently by comparing the respective bitmaps. We integrate the bitmap filtering into two categories of structural matching algorithms which are navigation-based algorithms and join-based algorithms. The experimental results demonstrate that the bitmap filtering can improve significantly the performance of XML structural matching.

[1]  Roy Goldman,et al.  DataGuides: Enabling Query Formulation and Optimization in Semistructured Databases , 1997, VLDB.

[2]  Sang Uk Lee,et al.  A comparative performance study of several global thresholding techniques for segmentation , 1990, Comput. Vis. Graph. Image Process..

[3]  Bülent Sankur,et al.  Survey over image thresholding techniques and quantitative performance evaluation , 2004, J. Electronic Imaging.

[4]  Andrew Lim,et al.  D(k)-index: an adaptive structural summary for graph-structured data , 2003, SIGMOD '03.

[5]  Chun-hung Li,et al.  Minimum cross entropy thresholding , 1993, Pattern Recognit..

[6]  Prasanna K. Sahoo,et al.  Threshold selection using Renyi's entropy , 1997, Pattern Recognit..

[7]  Gang Chen,et al.  Accelerating XML Structural Matching Using Suffix Bitmaps , 2007, International Conference on Computational Science.

[8]  Beng Chin Ooi,et al.  XR-tree: indexing XML data for efficient structural joins , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[9]  Dan Suciu,et al.  Index Structures for Path Expressions , 1999, ICDT.

[10]  Jignesh M. Patel,et al.  Structural joins: a primitive for efficient XML query pattern matching , 2002, Proceedings 18th International Conference on Data Engineering.

[11]  Jeffrey F. Naughton,et al.  Covering indexes for branching path queries , 2002, SIGMOD '02.

[12]  Tok Wang Ling,et al.  From Region Encoding To Extended Dewey: On Efficient Processing of XML Twig Pattern Matching , 2005, VLDB.

[13]  Hongjun Lu,et al.  Efficient Processing of XML Path Queries Using the Disk-based F&B Index , 2005, VLDB.

[14]  Maurice Clerc,et al.  The particle swarm - explosion, stability, and convergence in a multidimensional complex space , 2002, IEEE Trans. Evol. Comput..

[15]  Andrew K. C. Wong,et al.  A new method for gray-level picture thresholding using the entropy of the histogram , 1985, Comput. Vis. Graph. Image Process..

[16]  Vassilis J. Tsotras,et al.  Tree-Pattern Queries on a Lightweight XML Processor , 2005, VLDB.

[17]  Divesh Srivastava,et al.  Holistic twig joins: optimal XML pattern matching , 2002, SIGMOD '02.

[18]  P.K Sahoo,et al.  A survey of thresholding techniques , 1988, Comput. Vis. Graph. Image Process..

[19]  Ehud Gudes,et al.  Exploiting local similarity for indexing paths in graph-structured data , 2002, Proceedings 18th International Conference on Data Engineering.

[20]  Wenbo Xu,et al.  Particle swarm optimization with particles having quantum behavior , 2004, Proceedings of the 2004 Congress on Evolutionary Computation (IEEE Cat. No.04TH8753).

[21]  Peng-Yeng Yin,et al.  Multilevel minimum cross entropy threshold selection based on particle swarm optimization , 2007, Appl. Math. Comput..