A Protégé 4 Backend for Native OWL Persistence

We present a persistence layer for native storage and manipulation of OWL ontologies on top of the OWL API and an associated integration of the first version of this OWL persistence layer into the Protégé ontology engineering environment. This allows for an efficient handling of large ontologies within the Protégé 4 environment even if they do not fit in main memory. The approach is based on a direct mapping from native OWL constructs to database entries by utilising a framework for object-relational mappings. 1 Motivation There are numerous reasons that demand a scalable persistence layer for Protégé, e.g. the capability to process large ontologies. Former versions of Protégé (3.x) come along with persistence solutions that enable the database storage of ontologies. By evolving from version 3 to 4 and thus from the former frame based architecture to an architecture that supports the Web Ontology Language (OWL) inherently a gap emerges concerning the persistence of ontologies. Following the decision to abandon the frame based approach, this calls for a redesign of the former database storage format. In our approach to an OWL persistence layer, we utilise the OWL API [2] object model for the storage of native OWL language constructs to derive an appropriate database schema. In particular we focus on the axiomatic view given through the OWL API object model. We use an object-relational (O/R) mapping to realise native OWL persistence as a database backend for the OWL API. Since Protégé 4 builds on the OWL API, this persistence layer can readily be used as a database storage solution for Protégé. The reuse of the available store implementations for Protégé 3.x was not possible, as these implement the CLOS Meta-Object Protocol [4, Ch. 5/6]. Also the use of available triple stores, like jena or sesame, seemed not appropriate as they share the same drawbacks as the CLOS model on the database layer. In particular, the persistence solutions for those models are realised by the use of a single table, which results in a major performance loss. During the realisation, a major design issue was the non-invasive implementation by avoiding changes on the OWL API elements, hence ensuring full compatibility. Moreover the approach provides a plug-in for Protégé and thus an additional opportunity for ontology persistence without changing the Protégé core code. In the following Section we explain the advantages of a native OWL persistence layer and the reasons for the chosen database schema. In Sec. 3 the integration of the database back-end in Protégé 4 is explained. Section 4 entails the next development steps of the persistence solution and the conclusion. 2 Native OWL Persistence Layer Native OWL persistence refers to a direct way of representing OWL language constructs in an underlying storage layer one-to-one. This is in contrast to triple-based storage solutions, where an OWL ontology is represented at the more fine-grained level of triples. Avoiding the conversion to the triple structure is 1http://owlapi.sourceforge.net/ 2http://protege.stanford.edu/doc/design/jdbc_backend.html 3http://jena.sourceforge.net/ 4http://www.openrdf.org/ «interface» OWLDescription -id «table» OWLObject -uri «table» OWLClass -operands «table» OWLNaryBooleanDescription OWLObjectUnionOf OWLObjectIntersectionOf -operand «table» OWLObjectComplementOf -values «table» OWLObjectOneOf -property «table» OWLRestriction -cardinality -filler «table» OWLCardinalityRestriction OWLDataExactCardinalityRestriction OWLDataMaxCardinalityRestriction OWLDataMinCardinalityRestriction OWLObjectExactCardinalityRestriction OWLObjectMaxCardinalityRestriction OWLObjectMinCardinalityRestriction OWLObjectSelfRestriction -filler «table» OWLQuantifiedRestriction OWLObjectSomeRestriction OWLObjectAllRestriction OWLDataSomeRestriction OWLDataAllRestriction -value «table» OWLValueRestriction OWLObjectValueRestriction OWLDataValueRestriction «interface» OWLPropertyExpression -uri -anon «table» OWLIndividual «interface» OWLPropertyRange -subject -property -object «table» OWLIndividualRelationShipAxiom OWLObjectPropertyAssertionAxiom OWLNegativeObjectPropertyAssertionAxiom OWLNegativeDataPropertyAssertionAxiom OWLDataPropertyAssertionAxiom -individual -description «table» OWLClassAssertionAxiom Figure 1: UML structure showing complex classes (blue dashed area) and ABox (red dotted area) axioms. Classes stereotyped with table contribute a table to our schema. a major advantage, as this allows for faster storage and retrieval of OWL ontologies, saving processing time for conversion. Moreover querying the ontology can be speed up, as less (self-)joins on large triple structures are required. Furthermore complex OWL expressions, like cardinalities, can be stored in a more compressed way. To achieve this nativeness, we build the representation of an OWL ontology in our persistence layer on the OWL API object model as the basis for a mapping to database tables. We propose the use of an object-relational (O/R) mapping approach for native persistence of OWL ontologies, by mapping the OWL axioms of an ontology directly to a database schema and thus providing an axiomatic view on the database level. Derived from that, entities included in these axioms have to be persisted as well. By using cascaded insertions, a feature most common O/R mapping tools include, all entities and axioms referenced by an inserted ontology axiom become equally persisted, this reflects the axiomatic view on the ontology. Relationships between objects in an ontology are persisted as foreign key references. We also gain several benefits from the usage of a relational database management system (RDBMS), eg. as stated in [5] we can use the build-in support for transactions, access control, logging and recovery. Beyond that, the query optimisation techniques of modern RDBMSs can be used to optimise query performance, and thus provide better scalability compared to specialised RDF stores [10]. Schema Representation through O/R-Mapping We use the OWL API object model as conceptual basis of our implementation. This close integration allows the user to directly interact with the persisted ontology via the OWL API in the same way as with in-memory ontologies. The solution has been seamlessly integrated in the OWL API (refer to Sect. 3). Ontology modularisation can be achieved by using the OWL import mechanism, that even allows mixing in-memory and persisted ontologies in our implementation. The creation of the O/R mapping of OWL API objects to database tables, constituting our database schema, was done in a bottom-up approach. The axiom types, respectively objects which are mapped, are directly derived from the OWL API object model. Furthermore we added support for persistence of SWRL rules in the same manner, as the OWL API supports these likewise. The concrete classes for axioms and entities of the OWL API object model, as well as their corresponding interfaces and abstract classes, constitute a class hierarchy. Hence our O/R mapping has to be hierarchic as well. In order to realise this hierarchic mapping, we use the “One Class One Table” pattern mixed with the “One Inheritance Tree One Table” pattern [3]. The resulting database schema consists of 56 tables, representing the hierarchic structure used. We tried to minimise the overall number of tables, by coalescing class tables, sharing the same set of fields and the same parent class, in a single table. The majority of tables in our schema is required for modelling TBox axioms, e.g. cardinality, range and domain restrictions or complex class descriptions, cf. Fig. 1. The ABox of the ontology constitutes a smaller fraction of our schema. It consists mainly of the entity table for the individuals, named OWLIndividual, storing the URIs, a type-of relation (OWLClassAssertionAxiom) asserting classes to individuals and a 5E.g cardinality restrictions are stored as 4 triples using a state of the art triple store 6Our implementation is based on hibernate (http://www.hibernate.org) table representing the relationships of an individual (OWLIndividualRelationshipAxiom). Figure 1 shows a simplified schema for these axioms (red subgraph). Interestingly, the latter has three fields (subject, property and object) resembling a triple structure as proposed in [1]. For proper support of inheritance it is necessary to have a single table containing the primary key of all objects of the ontology and their types (OWLObject). In consequence this table grows very fast, leading to slower insertion performance. Vertical partitioning could be a way to reduce this performance loss. Additionally to the OWL API object model hierarchy, we introduce an extra table containing meta information about the ontology, e.g. its URI. Occurrences of redundancies in the persisted information are very rare. Related Approaches Compared to other database persistence solutions for OWL ontologies, like IBM SOR [5] or Owlgres [9], it is remarkable that our approach yields a database schema quite similar to these approaches. Though we focus on a direct manipulation in contrast to those systems being focused on reasoning and querying tasks. Since our implementation already supports OWL 2 [7], as used in the OWL API, we have a slightly higher count of axiom types and tables. Furthermore we did no conversion of semantic equivalent axioms, supporting the concept of the OWL API as ontology manipulation interface. We expect a similar performance of our suggested schema, when used for those tasks. An advantage of our solution compared to the basic database persistence of triples is the reduction of joins in case of queries and a higher selectivity on tables. The choice of an O/R mapping as intermediate layer allows us to vary several design issues, in particular concerning details of the database schema and caching, at deploy or run time. It should be mentioned, that our database schema is only one of several possible schemat