XML Structural Join Based on Extended Region Coding

Abstract XML has become a standard technology in exchange of a wide variety of data on web and internet for its structure, label, portability and expansibility. To efficiently query XML documents has been the primary urgent task. At the present time, most of XML index and query are based on encoding the XML document tree, so all kinds of XML encoding schemes are proposed, and region coding is the mainstream coding and used most commonly, such as Dietz coding, Li-Moon coding, Zhang coding, Wan coding, etc. The paper proposes an extended region coding based on region coding. Preorder XML document tree, and take preorder numerical orders of a node's all descendants as the region. When carrying out structural join, if preorder numerical order of a node is in this region, structural relation can be ensured. So this extended region coding can help effectively judge structural relation and the XML document tree needn’t be traversed. Furthermore, the better structural join algorithms of XML path queries have received considerable attention recently, and some researchers have proposed some fine algorithms to solve the problem. Stack-Tree-Desc algorithm is one of these fine algorithms, it need separately scan ancestor list and descendant list one time to decide ancestor/descendant structural relation, but some unneeded join nodes still be scanned. For this reason, if some element nodes of ancestor list and descendant list which don’t need participate in structural join can be jumped, the query efficiency is enhanced. Therefore, based on Stack-Tree-Desc algorithm an improved algorithm which introduces index structure to avoid scanning unwanted nodes, so ordered scanning is unnecessary, the consuming time of query shortens accordingly. And this improved algorithm can quickly judge structural relation based on extended region coding presented in this paper. Experiment is conducted to test the effectiveness of the extended region coding and the Indexed Stack-Tree-Desc algorithm. Experiment results show that the method in this paper is effective.

[1]  Wan Chang Indexing XML Data Based on Region Coding for Efficient Processing of Structural Joins , 2005 .

[2]  Ralph Arnote,et al.  Hong Kong (China) , 1996, OECD/G20 Base Erosion and Profit Shifting Project.

[3]  Jignesh M. Patel,et al.  Structural joins: a primitive for efficient XML query pattern matching , 2002, Proceedings 18th International Conference on Data Engineering.

[4]  Wang Jing,et al.  Structural Join of XML Based on Range Partitioning , 2004 .

[5]  Quanzhong Li,et al.  Indexing and Querying XML Data for Regular Path Expressions , 2001, VLDB.

[6]  Carlo Zaniolo,et al.  Efficient Structural Joins on Indexed XML Documents , 2002, VLDB.

[7]  David J. DeWitt,et al.  On supporting containment queries in relational database management systems , 2001, SIGMOD '01.

[8]  Li Jianzhong,et al.  Processing XPath over F&B-Index , 2010 .

[9]  Beng Chin Ooi,et al.  XR-tree: indexing XML data for efficient structural joins , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[10]  Chun Zhang,et al.  Storing and querying ordered XML using a relational database system , 2002, SIGMOD '02.

[11]  Tok Wang Ling,et al.  Efficient processing of partially specified twig pattern queries , 2009, Science in China Series F: Information Sciences.

[12]  Guoliang Li,et al.  An Effective Semantic Cache for Exploiting XPath Query/View Answerability , 2010, Journal of Computer Science and Technology.

[13]  Xiping Liu,et al.  Structural Join and Staircase Join Algorithms of Sibling Relationship , 2007, Journal of Computer Science and Technology.

[14]  Gang Chen,et al.  Efficient processing of ordered XML twig pattern matching based on extended Dewey , 2009, Journal of Zhejiang University-SCIENCE A.

[15]  Lei Chen,et al.  Effective XML content and structure retrieval with relevance ranking , 2009, CIKM.