XML Query Processing Using a Schema-Based Numbering Scheme

Establishing the hierarchical order among XML elements is an essential function of XML query processing techniques. Although most XML documents have an associated DTD or XML schema, the document structure information has not been utilized efficiently in query processing techniques proposed so far. In this paper, we propose a novel technique that uses DTD or XML schema to improve the disk I/O complexity of XML query processing. We present a schema-based numbering scheme called SPIDER that incorporates both structure information and tag names extracted from the document structure descriptions. Given the tag name and the identifier of an element, SPIDER can determine the tag names and the identifiers of the ancestor elements without disk I/O. Based on SPIDER, we designed a mechanism called VirtualJoin that significantly reduces disk I/O workload for processing XML queries. Our experiments indicated that SPIDER outperforms the structural join techniques Stack-Tree and PathStack in XML query processing, especially for XML queries with heavy join workload and large data sets.

[1]  Dan Suciu,et al.  Adding Structure to Unstructured Data , 1997, ICDT.

[2]  Beng Chin Ooi,et al.  XR-tree: indexing XML data for efficient structural joins , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[3]  Hongjun Lu,et al.  PBiTree coding and efficient processing of containment joins , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[4]  Chun Zhang,et al.  Storing and querying ordered XML using a relational database system , 2002, SIGMOD '02.

[5]  Susan B. Davidson,et al.  BLAS: an efficient XPath processing system , 2004, SIGMOD '04.

[6]  Ioana Manolescu,et al.  XMark: A Benchmark for XML Data Management , 2002, VLDB.

[7]  Ioana Manolescu,et al.  A Benchmark for XML Data Management , 2002 .

[8]  Torsten. Grust,et al.  Accelerating XPath location steps , 2002, SIGMOD '02.

[9]  Masatoshi Yoshikawa,et al.  Virtual joins for XML data , 2003 .

[10]  Kyoungro Yoon,et al.  Index structures for structured documents , 1996, DL '96.

[11]  Divesh Srivastava,et al.  Holistic twig joins: optimal XML pattern matching , 2002, SIGMOD '02.

[12]  Toshiyuki Amagasa,et al.  XRel: a path-based approach to storage and retrieval of XML documents using relational databases , 2001, ACM Trans. Internet Techn..

[13]  Elio Masciari,et al.  On the minimization of XPath queries , 2003, JACM.

[14]  Jignesh M. Patel,et al.  Structural joins: a primitive for efficient XML query pattern matching , 2002, Proceedings 18th International Conference on Data Engineering.

[15]  Roy Goldman,et al.  DataGuides: Enabling Query Formulation and Optimization in Semistructured Databases , 1997, VLDB.

[16]  Dan Suciu,et al.  Index Structures for Path Expressions , 1999, ICDT.

[17]  Carlo Zaniolo,et al.  Efficient Structural Joins on Indexed XML Documents , 2002, VLDB.

[18]  Dan Gusfield,et al.  Algorithms on Strings, Trees, and Sequences - Computer Science and Computational Biology , 1997 .

[19]  David J. DeWitt,et al.  On supporting containment queries in relational database management systems , 2001, SIGMOD '01.

[20]  Quanzhong Li,et al.  Indexing and Querying XML Data for Regular Path Expressions , 2001, VLDB.