Enhancing the Estimation Quality of Element-centered XML Summarization Methods

An XML summary should enable cardinality estimations of different kinds on an XML document to flexibly support query optimization for languages such as XPath or XQuery. In contrast to conventional methods which typically emulate the document structure and record path-oriented statistics for it, element-centered XML summarization methods collect statistical information for document nodes and their axes relationships and aggregate them separately for each distinct element/attribute name. It has already partially proven its superiority in quality, space consumption, and evaluation performance. Surprisingly, this kind of inversion seems to have more service capability than conventional approaches. It is not only confined to the cardinality estimation of child and descendant axes, but also allows to approximate parent and ancestor axes, too. Therefore, we refined and extended elementcentered XML summarization methods to capture more statistical information and propose new estimation procedures. We tested our ideas on a set of documents with largely varying characteristics.