Querying Semi-structured Data with Mutual Exclusion

Data analytics applications, content-based collaborative platforms and office applications require the integration and management of current and historical data from heterogeneous sources. XML is a standard data format for information. Thanks to its semi-structured-ness, it is a good candidate data model for the integration and management of heterogeneous content. However, the management of historical and collaboratively created data compels considering extensions of the original XML model to constraint-based, probabilistic and temporal aspects. We consider here an extension of the XML data model with mutual exclusion between nodes for the purpose of the management of versions in XML databases. XML query processing algorithms for ordinary XML data focus on the parent-child, ancestor-descendant, sibling and lowest common ancestor relationships between nodes. In this paper, we extend existing labeling schemes and query processing algorithms for the processing of queries over an extension of the XML data model with mutual exclusion. We focus on structured twig pattern query, and show that the same technique can be applied to keyword queries as well. We empirically evaluate the performance of the proposed techniques.

[1]  Parag Agrawal,et al.  Trio: a system for data, uncertainty, and lineage , 2006, VLDB.

[2]  Jianxin Li,et al.  Top-k keyword search over probabilistic XML data , 2011, 2011 IEEE 27th International Conference on Data Engineering.

[3]  Dan Olteanu,et al.  Fast and Simple Relational Processing of Uncertain Data , 2007, 2008 IEEE 24th International Conference on Data Engineering.

[4]  Alberto O. Mendelzon,et al.  Indexing Temporal XML Documents , 2004, VLDB.

[5]  Yannis Papakonstantinou,et al.  Efficient keyword search for smallest LCAs in XML databases , 2005, SIGMOD '05.

[6]  Carlo Combi,et al.  Querying Semistructured Temporal Data , 2006, EDBT Workshops.

[7]  Divesh Srivastava,et al.  Holistic twig joins: optimal XML pattern matching , 2002, SIGMOD '02.

[8]  Yehoshua Sagiv,et al.  Query efficiency in probabilistic XML models , 2008, SIGMOD Conference.

[9]  Richard T. Snodgrass,et al.  Temporal Slicing in the Evaluation of XML Queries , 2003, VLDB.

[10]  Serge Abiteboul,et al.  On the expressiveness of probabilistic XML models , 2009, The VLDB Journal.

[11]  Serge Abiteboul,et al.  Querying and Updating Probabilistic Information in XML , 2006, EDBT.

[12]  Tok Wang Ling,et al.  Efficient processing of XML twig patterns with parent child edges: a look-ahead approach , 2004, CIKM '04.

[13]  H. V. Jagadish,et al.  ProTDB: Probabilistic Data in XML , 2002, VLDB.

[14]  Yehoshua Sagiv,et al.  Matching Twigs in Probabilistic XML , 2007, VLDB.

[15]  Rada Chirkova,et al.  Efficiently Querying Large XML Data Repositories: A Survey , 2007, IEEE Transactions on Knowledge and Data Engineering.

[16]  David J. DeWitt,et al.  On supporting containment queries in relational database management systems , 2001, SIGMOD '01.

[17]  Chun Zhang,et al.  Storing and querying ordered XML using a relational database system , 2002, SIGMOD '02.

[18]  Alejandro A. Vaisman,et al.  Temporal XML: modeling, indexing, and query processing , 2008, The VLDB Journal.

[19]  Jef Wijsen,et al.  Current Trends in Database Technology - EDBT 2006, EDBT 2006 Workshops PhD, DataX, IIDB, IIHA, ICSNW, QLQP, PIM, PaRMA, and Reactivity on the Web, Munich, Germany, March 26-31, 2006, Revised Selected Papers , 2006, EDBT Workshops.

[20]  Torsten Grust,et al.  Advances in database technology - EDBT 2006 : 10th International Conference on Extending Database Technology, Munich, Germany, March 2006; proceedings , 2006 .