TREX: DTD-conforming XML to XML transformations

With the popularity of XML, it is increasingly common to nd data in the XML format. This highlights an important question: given an XML document S and a DTD D, how to extract data from S and construct another XML document T such that T conforms to the xed D? Let us refer to this as DTD-conforming XML to XML transformation. The need for this is evident in, e.g., data exchange: enterprises exchange their XML documents with respect to a certain prede ned DTD. Although a number of XML query languages (e.g., XQuery, XSLT) are currently being used to transform XML data, they cannot guarantee DTD conformance. Type inference and (static) checking for XML transformations are too expensive [1] to be used in practice; worse, they provide no guidance for how to specify a DTD-conforming XML to XML transformation. In response to the need we have developed TREX (TRansformation Engine for XML), a middleware system for DTDconforming XML to XML transformations. TREX is based on the novel notion of XTG (XML Transformation Grammar), which extends a DTD by incorporating semantic rules de ned with XML queries (expressed in Quilt [5]). This allows one to specify how to extract relevant data from a source XML document via the queries, and to construct a target XML document directed by the embedded DTD. TREX supports XTGs using Kweelt [6] as the underlying engine for XML queries (the reason for choosing Quilt rather than XQuery/XSL is that we could access the source code of Kweelt to incorporate our optimization algorithms). Given an XTG and a source document, it provides two evaluation modes: (1) in the batch mode, it generates a complete XML document, which is guaranteed to conform to the DTD embedded in the XTG; (2) in the lazy mode, it constructs a partial XML (DOM) tree, interacts with users, and expands the tree upon users' requests. As observed by [3], the lazy mode allows users to generate partial XML docu-