Compiling XQuery to Java bytecodes

Sequence is an abstract class, which is used for many purposes: nodes, sequences, Scheme lists and vectors. The NodeTree sub-class is used for nodes. It stores an entire document or document fragment in two arrays: a character array, and an Object array. “Pointers” between nodes are relative indexes stored in the character array using one or two 16-bit characters. The representation uses a “buffer gap” which allows efficient insertion and deletion of nodes near the gap. This representation is very compact, easy to append to, and supports efficient navigation (though some tuning of the basic design may be worthwhile). A position cookie is just an index into the character array. This works fine for read-only nodes. For modifiable nodes and documents we use an indirection table: the indexes in the indirection array are used for the magic cookies, while the values of the array are indexes into the NodeTree’s character array. The XMark “standard” 100MB test file (116 million bytes) is read by Qexo into an array of 104 million 16-bit Java characters, plus a 200-element object array of pointers to shared element and attribute names. It took a little over a minute to read the file, on a 1GHz PowerBook with 512MB of memory. Simple XPath selections using this representation run very quickly. In contrast, Saxon 7.9.1 needed about twice as big a heap, and took almost 8 times as long, largely due to increased paging. (The ”user” process time was only 50% more with Saxon.) To handle large persistent or remote databases, Qexo would need a new class derived from AbstractSequence. This class would handle caching and communication with the database. It would manage position integers which could be datebase keys or other proxies for the actual database nodes. That is not to imply this would be a trivial task: There are some places in Qexo that assume NodeTree, and they would have to be generalized. Making use of indexes would require teaching the Qexo optimizer about them. Updates and transactions will bring in a whole new set of issues. For convenience Kawa provides a set of wrapper classes that implement the W3C DOM interfaces. For example the class KNode implements the org.w3c.dom.Node interface. This is an object that has two fields: a reference to an AbstractSequence container, and a 32-bit integer position. A KNode does not carry node identity, and can be quite transitory. It is used when a node needs to be represented as an Object.Sequence container, and a 32-bit integer position. A KNode does not carry node identity, and can be quite transitory. It is used when a node needs to be represented as an Object. 11. EXTENSIONS Qexo has some non-standard extended features. Here are some of the more interesting ones. 11.1 Calling Java Methods Qexo (following many XSLT implementations) uses special namespaces to name Java classes. For example: declare namespace JInt = "class:java.lang.Integer"; JInt:toHexString(255) This invokes the static toHexString(int) method in the Java class java.lang.Integer, evaluating to the string "ff". You can also invoke non-static methods (passing the this receiver as the first parameter), or construct new objects (using new as the method name). The compiler picks the best matching method using the available type information. As a further convenience, you can just use a classname directly as a prefix, assuming there is no matching in-scope namespace, and the class is in the compile-time classpath. For example: java.lang.Object:toString(3+4) This calls the toString method of the object representing 7, yielding the string "7". 11.2 Servlets Kawa has built-in suppport for automatically compiling an XQuery query (or other Kawa-supported language) to a servlet. A servlet is a kind of Java class that is executed in an appropriate web server in response to HTTP requests. The result of the query becomes the HTTP response. Here is a trivial but valid servlet: