Exploiting native XML indexing techniques for XML retrieval in relational database systems

In XML retrieval, two distinct approaches have been established and pursued without much cross-fertilization taking place so far. On the one hand, native XML databases tailored to the semistructured data model have received considerable attention, and a wealth of index structures, join algorithms, tree encodings and query rewriting techniques for XML have been proposed. On the other hand, the question how to make XML fit the relational data model has been studied in great detail, giving rise to a multitude of storage schemes for XML in relational database systems (RDBSs). In this paper we examine how native XML indexing techniques can boost the retrieval of XML stored in an RDBS. We present the Relational CADG (RCADG), an adaptation of several native indexing approaches to the relational model, and show how it supports the evaluation of a clean formal language of conjunctive XML queries. Unlike relational storage schemes for XML, the RCADG largely preserves the underlying tree structure of the data in the RDBS, thus addressing several open problems known from the literature. Experiments show that the RCADG accelerates retrieval by up to two or even three orders of magnitude compared to both native and relational approaches.

[1]  Georg Gottlob,et al.  Efficient Algorithms for Processing XPath Queries , 2002, VLDB.

[2]  Roy Goldman,et al.  DataGuides: Enabling Query Formulation and Optimization in Semistructured Databases , 1997, VLDB.

[3]  Chun Zhang,et al.  Storing and querying ordered XML using a relational database system , 2002, SIGMOD '02.

[4]  Patrick E. O'Neil,et al.  ORDPATHs: insert-friendly XML node labels , 2004, SIGMOD '04.

[5]  Susan B. Davidson,et al.  BLAS: an efficient XPath processing system , 2004, SIGMOD '04.

[6]  Klaus U. Schulz,et al.  The BIRD Numbering Scheme for XML and Tree Databases - Deciding and Reconstructing Tree Relations Using Efficient Arithmetic Operations , 2005, XSym.

[7]  Denilson Barbosa,et al.  ToX - the Toronto XML Engine , 2001, Workshop on Information Integration on the Web.

[8]  Cherié L. Weible,et al.  The Internet Movie Database , 2001 .

[9]  Georg Gottlob,et al.  Conjunctive queries over trees , 2004, JACM.

[10]  Dan Suciu,et al.  Index Structures for Path Expressions , 1999, ICDT.

[11]  Vishu Krishnamurthy,et al.  Performance Challenges in Object-Relational DBMSs , 1999, IEEE Data Eng. Bull..

[12]  Quanzhong Li,et al.  Indexing and Querying XML Data for Regular Path Expressions , 2001, VLDB.

[13]  David J. DeWitt,et al.  On supporting containment queries in relational database management systems , 2001, SIGMOD '01.

[14]  Haim Kaplan,et al.  A comparison of labeling schemes for ancestor queries , 2002, SODA '02.

[15]  Michael Gertz,et al.  An Efficient XML Node Identification and Indexing Scheme , 2003 .

[16]  Alin Deutsch,et al.  Storing semistructured data with STORED , 1999, SIGMOD '99.

[17]  Toshiyuki Amagasa,et al.  XRel: a path-based approach to storage and retrieval of XML documents using relational databases , 2001, ACM Trans. Internet Techn..

[18]  Hongjun Lu,et al.  Path Materialization Revisited: An Efficient Storage Model for XML Data , 2002, Australasian Database Conference.

[19]  Jeffrey F. Naughton,et al.  XML-SQL Query Translation Literature: The State of the Art and Open Problems , 2003, Xsym.

[20]  Michael J. Franklin,et al.  A Fast Index for Semistructured Data , 2001, VLDB.

[21]  Beng Chin Ooi,et al.  XR-tree: indexing XML data for efficient structural joins , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[22]  David J. DeWitt,et al.  Relational Databases for Querying XML Documents: Limitations and Opportunities , 1999, VLDB.

[23]  Torsten. Grust,et al.  Accelerating XPath location steps , 2002, SIGMOD '02.

[24]  Juliana Freire,et al.  From XML schema to relations: a cost-based approach to XML storage , 2002, Proceedings 18th International Conference on Data Engineering.

[25]  François Bry,et al.  Content-Aware DataGuides: Interleaving IR and DB Indexing Techniques for Efficient Retrieval of Textual XML Data , 2004, ECIR.

[26]  Quanzhong Li,et al.  XISS/R: XML Indexing and Storage System using RDBMS , 2003, VLDB.

[27]  Daniela Florescu,et al.  Storing and Querying XML Data using an RDMBS , 1999, IEEE Data Eng. Bull..

[28]  François Bry,et al.  Visual exploration and retrieval of XML document collections with the generic system X2 , 2005, International Journal on Digital Libraries.

[29]  Roy Goldman,et al.  From Semistructured Data to XML: Migrating the Lore Data Model and Query Language , 1999, Markup Lang..

[30]  Carmem S. Hara,et al.  RRXF: Redundancy reducing XML storage in relations , 2003, VLDB.

[31]  Menzo Windhouwer,et al.  Efficient Relational Storage and Retrieval of XML Documents , 2000, WebDB.

[32]  Sherif Sakr,et al.  XQuery on SQL Hosts , 2004, VLDB.