论文信息 - Compact Labeling Scheme for XML Ancestor Queries

Compact Labeling Scheme for XML Ancestor Queries

AbstractXML documents are often viewed as trees (basically the parse tree of the document), and queries over such documents typically test for ancestor relationships among tree nodes. Search engines process such queries using an index structure summarizing the ancestor relations. In the index, each document item (tree node) is identified using some logical id (node label), such that, given two labels, the engine can determine the ancestor relationship between the corresponding nodes. The length of the labels is a main factor of the index size. Therefore, reducing this length, even by a constant factor, is a critical issue. In this work we consider the following problem. Given a rooted XML tree T, label the nodes of T in the most compact way such that given the labels of two nodes, one can determine in constant time, by looking at the labels only, whether one node is an ancestor of the other. Labelings currently being used are all variants of the following interval scheme. Number the leaves say from left to right and label each node with a pair consisting of the numbers of its smallest and largest leaf descendants. An ancestor query then amounts to an interval containment test on the labels. The maximum label length using this scheme is 2 log n, where n is the number of nodes in the tree. (All logarithms in this paper are to base 2.) The focus of this work is finding a scheme that works best in practice on real XML data. We suggest an orthogonal prefix-based approach, where the labeling is such that an ancestor query roughly amounts to testing whether one label is a prefix of the other. We present several new labeling schemes based on this approach and analyze their performance both theoretically and empirically.

Haim Kaplan | Tova Milo | Ronen Shabo

[1] M. Ronan Sleep,et al. Uniform Random Generation of Balanced Parenthesis Strings , 1980, TOPL.

[2] Stephen Alstrup,et al. Nearest common ancestors: a survey and a new distributed algorithm , 2002, SPAA.

[3] Uzi Vishkin,et al. Recursive Star-Tree Parallel Data Structure , 1993, SIAM J. Comput..

[4] Sergey Brin,et al. The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[5] Ran Raz,et al. Distance labeling in graphs , 2001, SODA '01.

[6] Donald E. Knuth,et al. Optimum binary search trees , 1971, Acta Informatica.

[7] David J. DeWitt,et al. NiagaraCQ: a scalable continuous query system for Internet databases , 2000, SIGMOD 2000.

[8] David J. DeWitt,et al. NiagaraCQ: a scalable continuous query system for Internet databases , 2000, SIGMOD '00.

[9] Uzi Vishkin,et al. On Finding Lowest Common Ancestors: Simplification and Parallelization , 1988, AWOC.

[10] Adriano M. Garsia,et al. A New Algorithm for Minimum Cost Binary Trees , 1977, SIAM J. Comput..

[11] Harold N. Gabow,et al. Data structures for weighted matching and nearest common ancestors with linking , 1990, SODA '90.