XML data partitioning schemes for parallel holistic twig joins

Purpose – The purpose of this paper is to propose Extensible Markup Language (XML) data partitioning schemes that can cope with static and dynamic allocation for parallel holistic twig joins: grid metadata model for XML (GMX) and streams‐based partitioning method for XML (SPX).Design/methodology/approach – GMX exploits the relationships between XML documents and query patterns to perform workload‐aware partitioning of XML data. Specifically, the paper constructs a two‐dimensional model with a document dimension and a query dimension in which each object in a dimension is composed from XML metadata related to the dimension. GMX provides a set of XML data partitioning methods that include document clustering, query clustering, document‐based refinement, query‐based refinement, and query‐path refinement, thereby enabling XML data partitioning based on the static information of XML metadata. In contrast, SPX explores the structural relationships of query elements and a range‐containment property of XML stream...

[1]  Robert M. Keller,et al.  The Gradient Model Load Balancing Method , 1987, IEEE Transactions on Software Engineering.

[2]  Hiroyuki Kitagawa,et al.  XML data partitioning strategies to improve parallelism in parallel holistic twig joins , 2009, ICUIMC '09.

[3]  Hiroyuki Kitagawa,et al.  Querying XML Data using PC Cluster System , 2007 .

[4]  Kevin Lü,et al.  Parallel processing XML documents , 2002, Proceedings International Database Engineering and Applications Symposium.

[5]  Hiroyuki Kitagawa,et al.  GMX: an XML data partitioning scheme for holistic twig joins , 2008, iiWAS.

[6]  Michael Gertz,et al.  On Distributing XML Repositories , 2003, WebDB.

[7]  Jeffrey Xu Yu,et al.  TwigList : Make Twig Pattern Matching Fast , 2007, DASFAA.

[8]  Jignesh M. Patel,et al.  Structural joins: a primitive for efficient XML query pattern matching , 2002, Proceedings 18th International Conference on Data Engineering.

[9]  Kam-Fai Wong,et al.  WIN: an efficient data placement strategy for parallel XML databases , 2005, 11th International Conference on Parallel and Distributed Systems (ICPADS'05).

[10]  Rizos Sakellariou,et al.  Compile-time minimisation of load imbalance in loop nests , 1997, ICS '97.

[11]  Vipin Kumar,et al.  Introduction to Data Mining , 2022, Data Mining and Machine Learning Applications.

[12]  Report,et al.  Public International Benchmarks for Parallel Computers , 1993 .

[13]  Hua-Gang Li,et al.  Twig2Stack: bottom-up processing of generalized-tree-pattern queries over XML documents , 2006, VLDB.

[14]  Jeffrey F. Naughton,et al.  On the integration of structure indexes and inverted lists , 2004, Proceedings. 20th International Conference on Data Engineering.

[15]  Tok Wang Ling,et al.  From Region Encoding To Extended Dewey: On Efficient Processing of XML Twig Pattern Matching , 2005, VLDB.

[16]  Jennifer Widom,et al.  Query Optimization for XML , 1999, VLDB.

[17]  Rada Chirkova,et al.  Efficiently Querying Large XML Data Repositories: A Survey , 2007, IEEE Transactions on Knowledge and Data Engineering.

[18]  Divesh Srivastava,et al.  Holistic twig joins: optimal XML pattern matching , 2002, SIGMOD '02.

[19]  Jun Miyazaki,et al.  Efficient Query Processing for Large XML Data in Distributed Environments , 2007, 21st International Conference on Advanced Information Networking and Applications (AINA '07).

[20]  Stephen Taylor,et al.  A Practical Approach to Dynamic Load Balancing , 1998, IEEE Trans. Parallel Distributed Syst..

[21]  Tok Wang Ling,et al.  On boosting holism in XML twig pattern matching using structural indexing techniques , 2005, SIGMOD '05.

[22]  Ed Zaluska,et al.  Parallel Load-Balancing: An Extension to the Gradient Model , 1995, Parallel Comput..

[23]  Roy Goldman,et al.  DataGuides: Enabling Query Formulation and Optimization in Semistructured Databases , 1997, VLDB.

[24]  Philip S. Yu,et al.  ViST: a dynamic index method for querying XML data by tree structures , 2003, SIGMOD '03.

[25]  David J. DeWitt,et al.  On supporting containment queries in relational database management systems , 2001, SIGMOD '01.

[26]  Alfredo Cuzzocrea,et al.  Efficient Fragmentation of Large XML Documents , 2007, DEXA.

[27]  Hiroyuki Kitagawa,et al.  Processing XPath Queries in PC-Clusters Using XML Data Partitioning , 2006, 22nd International Conference on Data Engineering Workshops (ICDEW'06).