XTree: A New XML Keyword Retrieval Model

As more and more data are represented and stored by XML format, how to query XML data has become an increasingly important research issue. Keyword search is a proven user-friendly way of querying HTML documents, and it is well suited to XML trees as well. However, it is still an open problem in XML keyword retrieval that which XML nodes are meaningful and reasonable to a query, how to find these nodes effectively and efficiently. In recent years, many XML keyword retrieval models have been presented to solve the problem, such as XRANK and SLCA. These models usually return the most specific results and discard most ancestral nodes. There may not be sufficient information for users to understand the returned results easily. In this paper, we present a new XML keyword retrieval model, XTree, which can cover every keyword node and return the comprehensive result trees. For XTree model, we propose Xscan algorithm for processing keyword queries and GenerateTree for constructing results. We analytically and experimentally evaluate the performances of our algorithms, and the experiments show that our algorithms are efficient.

[1]  Juliana Freire,et al.  From XML schema to relations: a cost-based approach to XML storage , 2002, Proceedings 18th International Conference on Data Engineering.

[2]  Feng Shao,et al.  XRANK: ranked keyword search over XML documents , 2003, SIGMOD '03.

[3]  Carlo Zaniolo,et al.  Efficient Structural Joins on Indexed XML Documents , 2002, VLDB.

[4]  Jennifer Widom,et al.  Indexing Semistructured Data , 1998 .

[5]  Yannis Papakonstantinou,et al.  Efficient keyword search for smallest LCAs in XML databases , 2005, SIGMOD '05.

[6]  Menzo Windhouwer,et al.  Efficient Relational Storage and Retrieval of XML Documents , 2000, WebDB.

[7]  Michael J. Franklin,et al.  A Fast Index for Semistructured Data , 2001, VLDB.

[8]  Yi Chen,et al.  Identifying meaningful return information for XML keyword search , 2007, SIGMOD '07.

[9]  Scott Boag,et al.  XQuery 1.0 : An XML Query Language , 2007 .

[10]  Gottfried Vossen,et al.  The World Wide Web and Databases , 2001, Lecture Notes in Computer Science.

[11]  David J. DeWitt,et al.  Relational Databases for Querying XML Documents: Limitations and Opportunities , 1999, VLDB.

[12]  Guido Moerkotte,et al.  Efficient Storage of XML Data , 2000, Proceedings of 16th International Conference on Data Engineering (Cat. No.00CB37073).

[13]  Steven J. DeRose,et al.  XML Path Language (XPath) , 1999 .

[14]  Junghoo Cho,et al.  A fast regular expression indexing engine , 2002, Proceedings 18th International Conference on Data Engineering.

[15]  Brian F. Cooper,et al.  A parallel index for semistructured data , 2002, SAC '02.