The Babel of the Semantic Web Tongues – In Search of the Rosetta Stone of Interoperability

We discuss a vision of the Semantic Web where ontologies, services, and devices can seamlessly interoperate across a multitude of protocols and languages. In particular, we discuss the importance of enabling interoperability for Semantic Web technologies in the knowledge representation layer, and give an overview of the Distributed Ontology Language DOL addressing this aspect, representing a first piece of a Rosetta Stone enabling overall interoperability. The Babylonian Confusion The Semantic Web has led to an endless number of different standards: XML, RDF, RDFS and RDFa are used for the exchange of (possibly large) datasets. Ontological knowledge can be represented in the different profiles of OWL, which represent trade-offs between expressiveness of the languages and effectiveness of available tools for certain reasoning tasks. On one side of this spectrum there are large ontologies like SNOMED CT, expressed in logics of low expressivity such as OWL EL. In the opposite direction, the spectrum does not end with the relatively expressive, but still decidable, OWL 2 DL; rather, it is common practice (e.g. in bio ontologies) to intersperse OWL ontologies with first-order axioms in the comments or annotate them as having temporal behaviour [Smith et al., 2005, Beisswanger et al., 2008], although, unfortunately, these axioms will be ignored by tools. Foundational ontologies, such as DOLCE, BFO or SUMO, also use full first-order logic (and Common Logic is an ISO-standardised language with first-order expressivity) or even first-order modal logic. Even though such ontologies may not always be (considered as) part of the Semantic Web, many of them have the ontology’s symbols identified by IRIs and partial OWL implementations are available (e.g. for DOLCE and BFO), and foundational ontologies serve as a methodological guideline for the design of (ontologically wellconstructed) domain ontologies. The large-scale commercial ontology Cyc provides a rich formalisation of common sense, and involves all kinds of logics, like first-order, higher-order, contextualised and non-monotonic logics. The latter also play a role in the W3C standard RIF (rule interchange format), which actually comprises a whole family of rule-based languages, trying to capture the important features of input languages of current industrial rule-based tools. 1 The Tower of Babel (image from Wikipedia) The result is a Babylonian confusion1 of languages used in ontology engineering, and whilst certain relationships between some of the languages are well-studied, such as logical translations, others are just being begun to be investigated and understood, e.g. useful relations between OWL and RIF-PRD. Indeed, Tim Berners-Lee’s original version of the Semantic Web layer cake has featured a rich number of heterogeneous logical languages involved in the Semantic Web, as depicted in Figure 1. Figure 1. Tim Berners-Lee’s early version of the semantic web layer cake The original vision of the Semantic Web [Berners-Lee et al., 2001] emphasized the role of intelligent agents, which combine information found on the Web to assist with complex tasks such as making an appointment with a nearby doctor who specialises on the user’s current disease. Agents can rely on web services that solve specialised problems and combine these in order to provide more powerful services. W3C standards and submissions for services include the Web Service Description Language (WSDL) for the specification of interfaces of single web services, the Web Service Choreography Description Language (WSCDL) for the specification of the interplay of decentralised services, and others such as WSCL, WS-Transfer, WS-Eventing etc., not to mention important non-W3C languages such as BPEL and BPMN. Again, this multitude of languages is only partly due to idiosyncrasies of companies and organisations, but more importantly also due to different intended characteristics and features of the languages, e.g. expressing different aspects of (composed) services. Another reason for utilising different languages has become apparent with the expansion of the Linked Open Data cloud. The Linked Open Data cloud is particularly interesting in the context of so-called big data – i.e. data sets so large that their capturing, storage, management, and processing is a challenge in itself [Big Data, 2012]. Big data might, for example, originate from large-scale scientific experiments, social networks, or sensor networks. In the following, we restrict ourselves to big data that has been made available as linked data. With the current state of the art, this means that such datasets will be described using vocabularies with a weak semantics (typically, a subset of RDFS plus a few hand-picked OWL constructs). Agents consuming these data sets require stronger semantics (without sacrificing scalability 2 1 Babel is also the name of an early conversion tool. when used locally). Therefore, the required ontologies will be implemented and maintained in languages different from those used for the data sets. Again, different languages are used side-by-side to describe aspects of the same problem space. Going beyond virtual agents and services, in the future, embodiment will play a greater role. A recently fast evolving area is smart environments, which provide embodied services. Here, the issue of service description arises in a similar way as for web services, and some standards such as the ‘universal remote console’ (URC) have been defined. Moreover, the (social) interactions between such embodied services and human agents acting in such environments (e.g. using intelligent dialogue systems) will clearly be a research topic of increasing importance in the future, especially in an aging society, and may be seen as the most challenging interoperability problem of all. The Vision of Interoperability This multitude of languages and endeavours bring about an interoperability problem, which other activities try to counteract and overcome – including standardisation efforts. The diversity of current interoperability initiatives demonstrates, however, that there is currently no unified framework available within which the various interoperability efforts themselves could be synchronised and orchestrated. We expect that, by 2022, a Rosetta stone of interoperability, bridging this Babylonian confusion and extending Tim Berners-Lee’s original vision, will have been found. This Rosetta stone will ensure interoperability within and among the areas of knowledge engineering, services and devices. In each of these areas, interoperability will occur at various levels: 1.interoperability at the level of individual data, services, and devices; 2.interoperability at the level of models: ontologies (ontology alignment/integration), service descriptions (service matching), and device descriptions; 3.interoperability among different metamodels: ontology languages, service and device description languages. Figure 2. The interoperability stack 3 The Rosetta Stone (image from Wikipedia) With such a systematic and flexible interoperability at all and across all of these levels, one can integrate data stemming from different sources, using different schemas, and formulated in possibly different schema languages. This also means that translations occur at all three levels: e.g. for the left column in the picture, we may need translation of data, translation of ontologies (ontology alignment), and translation of ontology languages. Then, much of the content written in hitherto unrelated languages can be connected. We can concentrate on the content, services, and devices, and find out more easily whether different pieces of content, different services and different devices can be related and integrated in a meaningful way or not. The Distributed Ontology Language (DOL) The vision depicted in Figure 2 is quite broad, and its realisation involves efforts in several areas. As a first step towards interoperability shown in the left column of Figure 2, we here sketch the Distributed Ontology Language (DOL), a metalanguage for ontology integration and interoperability, which accepts the diverse reality found within the Semantic Web.2 The process of standardising DOL within the ISO is to be finished in 2015, and tool support is under way. DOL allows for maintaining one connected, distributed ontology per application instead of two or more separate, disconnected implementations and ontologies. Although the foundations of the Semantic Web – IRIs and RDF graphs – are likely to be sufficiently strong for integrating most desirable ontology languages in the foreseeable future, it is important to recognise that there will be a diversity of languages on top of RDF used to express ontologies. DOL is not “yet another ontology language”, but it provides a meta-level framework that integrates different ontology languages and makes them interoperable – and this regardless of whether their syntax is compatible with RDF or not, as long as their semantics can be formalised in a set-theoretic or institution-theoretic way [Mossakowski et al., 2012]. A distributed ontology consists of modules, which may be implemented in different DOL-conforming ontology languages, and which are interlinked by formal, logical links such as imports or theory interpretations, or informal, non-logical alignments (as returned, e.g., by statistical matching procedures) [Kutz et al., 2010]. DOL is to our knowledge the first language that systematically supports the expression of such a collection of links, indeed even the first language for the subcase of only homogeneous links (e.g. between two OWL ontologies). Heterogeneous logical links, i.e. across ontology languages, are semantically backed by a graph of logic translations (towards more expressive logics) and projections (towards less expressive logics). Links – logical as well as non-logical ones – across ontology languages are syntactically backed by