The XXL search engine: ranked retrieval of XML data using indexes and ontologies

1. Motivation XML is becoming the standard for integrating and exchanging data over the Internet and within intranets, covering the complete spectrum from largely unstructured, ad hoe documents to highly structured, schematic data. For searching information in open environments such as the Web or intranets of large corporations, ranked retrieval is more appropriate: a query result is a rank list of XML elements in descending order of relevance. We have developed a core language, coined XXL for "flexible X ML search language" [1], for ranked retrieval of XML data using regular element path expressions and search conditions over element contents. For similarity search we have introduced a new operator "~", which can be used for both element content comparisons and approximate matching of element names. On the XML Shakespeare play collection, for example, we can search for scenes where a woman talks about leadership in the presence of Macbeth by the XXL query: