Processing XML Queries with Structural and Full-Text Constraints

Efficient query processing on XML data is an important task for querying the data web. In this paper, we consider the XML query which can be represented as a query tree with twig patterns, and also consists of complex full-text constraints. Two approaches are proposed. The structure-first approach will first identify the elements which satisfy the tag constraint, and then process the full-text constraint on the terms represented within each element. The satisfied elements will be combined to meet the complete twig constraints. On the other hand, the keyword-first approach will first identify the elements which represent the required keywords, and then return the elements which satisfy the given full-text predicates and structural constraints. We demonstrate, via an extensive experimental study, that the two approaches have their own merits.

[1]  Bongki Moon,et al.  PRIX: indexing and querying XML using prufer sequences , 2004, Proceedings. 20th International Conference on Data Engineering.

[2]  Sihem Amer-Yahia,et al.  Flexible and efficient XML search with complex full-text predicates , 2006, SIGMOD Conference.

[3]  Mirina Grosz,et al.  World Wide Web Consortium , 2010 .

[4]  Sihem Amer-Yahia,et al.  Structure and Content Scoring for XML , 2005, VLDB.

[5]  Jeffrey F. Naughton,et al.  On the integration of structure indexes and inverted lists , 2004, Proceedings. 20th International Conference on Data Engineering.

[6]  Jignesh M. Patel,et al.  Structural joins: a primitive for efficient XML query pattern matching , 2002, Proceedings 18th International Conference on Data Engineering.

[7]  Shirish Tatikonda,et al.  LCS-TRIM: Dynamic Programming Meets XML Indexing and Querying , 2007, VLDB.

[8]  Tok Wang Ling,et al.  From Region Encoding To Extended Dewey: On Efficient Processing of XML Twig Pattern Matching , 2005, VLDB.

[9]  Gerhard Weikum,et al.  An Efficient and Versatile Query Engine for TopX Search , 2005, VLDB.

[10]  Gabriella Kazai INitiative for the Evaluation of XML Retrieval , 2009, Encyclopedia of Database Systems.

[11]  Vasilis Vassalos,et al.  Cost based plan selection for xpath , 2009, SIGMOD Conference.

[12]  Cong Yu,et al.  Querying structured text in an XML database , 2003, SIGMOD '03.

[13]  Divesh Srivastava,et al.  Holistic twig joins: optimal XML pattern matching , 2002, SIGMOD '02.

[14]  Hua-Gang Li,et al.  Twig2Stack: bottom-up processing of generalized-tree-pattern queries over XML documents , 2006, VLDB.

[15]  Nicola Onose,et al.  XML query optimization in the presence of side effects , 2008, SIGMOD Conference.

[16]  Quanzhong Li,et al.  Indexing and Querying XML Data for Regular Path Expressions , 2001, VLDB.

[17]  Hongjun Lu,et al.  Holistic Twig Joins on Indexed XML Documents , 2003, VLDB.

[18]  Feng Shao,et al.  XRANK: ranked keyword search over XML documents , 2003, SIGMOD '03.

[19]  Riham Abdel Kader,et al.  ROX: run-time optimization of XQueries , 2009, SIGMOD Conference.

[20]  Philip S. Yu,et al.  ViST: a dynamic index method for querying XML data by tree structures , 2003, SIGMOD '03.

[21]  Wang-Chien Lee,et al.  A Path-based Approach for Efficient Evaluation of Twig Queries over XML Data , 2007, J. Inf. Sci. Eng..

[22]  Carlo Zaniolo,et al.  Efficient Structural Joins on Indexed XML Documents , 2002, VLDB.