XCpaqs: compression of XML document with XPath query support

Information in XML format has obvious redundancy that wastes disk space, bandwidth and disk I/O when querying XML data. For the efficiency of storage and query XML, it is necessary to compress XML data. In this paper, XCpaqs, a compression technology of XML, is presented. XCpaqs separates XML document into structure and context information. At the same time, it keeps homomorphism relation between compressed and original XML document. XCpaqs encodes tag and path respectively. It makes parts of XPath query could be processed in main memory. XCpaqs can recognize data types and uses different encode strategy to compress data with different type. This feature makes the technology support XML documents without schema information. Therefore, XCpaqs is adaptive for XML warehouse, which stores XML documents gathered from internet with various schemas. The technology of query execution on XML data compressed by XCpaqs is also presented.