EXTENDED TREE-PATTERN CLUSTERING TECHNIQUES FOR MASSIVE XML STORAGES

The Extensible Mark-up Language (XML) is an emerging standard for describing data on the Web. As the widespread activities of the Internet and the Web results in vast amounts of data to be generated everyday, the manipulation of such semi-structured textual data is however becoming an important issue in XML storage research. The unique feature of semi-structured data is generally suitable for storage in a tree-like data form. Locating data in this form is based on tree-pattern matching techniques. As a result, effectively evaluating path expression is the key to provide efficient access to such tree-like data storage. In this paper, we apply two novel signature based access methods, which can significantly extend the scope of tree-pattern cluster in order to navigate massive XML databases. We present the process of producing the signatures in details, and further provide the algorithms to demonstrate how they would work. We also show the advantages of using extended tree-pattern clustering techniques in handling large amounts of XML documents.

[1]  Michael J. Franklin,et al.  A Fast Index for Semistructured Data , 2001, VLDB.

[2]  Roy Goldman,et al.  Lore: a database management system for semistructured data , 1997, SGMD.

[3]  Guido Moerkotte,et al.  Efficient Storage of XML Data , 2000, Proceedings of 16th International Conference on Data Engineering (Cat. No.00CB37073).

[4]  Jeffrey F. Naughton,et al.  A general technique for querying XML documents using a relational database system , 2001, SGMD.

[5]  Dan Suciu,et al.  Index Structures for Path Expressions , 1999, ICDT.

[6]  Alin Deutsch,et al.  XML-QL: A Query Language for XML , 1998 .

[7]  Hongjun Lu,et al.  Path Materialization Revisited: An Efficient Storage Model for XML Data , 2002, Australasian Database Conference.

[8]  Jennifer Widom,et al.  Query Optimization for XML , 1999, VLDB.

[9]  Suh-Yin Lee,et al.  Placement of Partitioned Signature File and Its Performance Analysis , 1998, Inf. Sci..

[10]  Christos Faloutsos,et al.  Signature files: design and performance comparison of some signature extraction methods , 1985, SIGMOD Conference.

[11]  Kyuseok Shim,et al.  APEX: an adaptive path index for XML data , 2002, SIGMOD '02.

[12]  Yangjun Chen,et al.  Signature files and signature trees , 2002, Inf. Process. Lett..

[13]  Roy Goldman,et al.  Views for Semistructured Data , 1997 .

[14]  Jeffrey F. Naughton,et al.  Covering indexes for branching path queries , 2002, SIGMOD '02.

[15]  Hyoung-Joo Kim,et al.  SigDAQ: an enhanced XML query optimization technique , 2002, J. Syst. Softw..

[16]  Paolo Atzeni,et al.  XML AND DATABASES , 2004 .

[17]  Roy Goldman,et al.  DataGuides: Enabling Query Formulation and Optimization in Semistructured Databases , 1997, VLDB.