DBpedia is a community effort that has created the most important cross-domain datasets in RDF, a focal point of the Linked Open Data (LOD) cloud. In its core there is a set of declarative mappings extracting the data from Wikipedia infoboxes and tables into the RDF. However, while DBpedia focuses on publishing knowledge in a machine-readable way, little attention has been paid to the benefits of supporting machine updates. This greatly restricts the possibilities of automatic curation of the DBpedia data that could be semi-automatically propagated to Wikipedia, and also prevents maintainers from evaluating the impact of their edits on the consistency of knowledge. Excluding the DBpedia taxonomy from the editing cycle is a major drawback which we aim to address. This paper starts a discussion of DBpedia making a case for a benchmark for Ontology Based Data Management (OBDM). As we show, although based on fairly restricted mappings (which we cast as a variant of nested tgds here) and minimalistic TBox language, accommodating DBpedia updates is intricate from different perspectives, ranging from conceptual (what is an adequate semantics for DBpedia SPARQL updates?) to challenges related to the user interface design. 1 Formalization of the OBDM Setting We define the declarative WikiDBpedia framework (WDF) as a pair (M, T ) where M is a schema mapping between the structured Wiki data and DBpedia [8], and T is a DBpedia TBox. Specifically,M is a triple (W,T, Σ) based on a set Σ of nested tuple generating dependences (tgds) [5, 7] of a special form translating the Wiki schema W into an ABox over the DBpedia vocabulary T. A WDF instance of a WDF (M, T ) is a Wiki instance I satisfying W. We now specify the language used to formalize the TBox T , the tgds language of Σ and the Wiki schema W. DBpedia ontology language. DBpedia uses a fragment of OWL 2 RL profile which we call DBP. The fragment includes OWL keywords subClassOf (which we abbreviate as sc) and subPropertyOf (sp), domain and range (respectively, dom and rng), inversePropertyOf (inv), disjointWith (dw), propertyDisjointWith (pdw) and functionalProperty (func). At present, only data properties are declared as functional in DBpedia and no roles are declared inverse functional. Inference rules for the ontology language DBP are summarized in Fig. 1. Application of these rules terminates and thus allows for materialization of the ABox. Infobox schema W. Each Wiki page is identified by a URI which translates to a subject IRI in DBpedia. A page can contain several infoboxes of distinct types. We model this semistructured data store using a relational schema W with two ternary relations Wi = UTI and Wd = IPV, attribute I storing infobox identifiers, U page URI, T infobox type, and P and V being property names resp. values. That is, unlike the real Wiki where infoboxes may belong to different pages or be separate tables of distinct types, ? An extended version of this paper including additional details is available in [3]. A sc B : A(x)→ B(x) P sp Q : P (x, y)→ Q(x, y) P dom A : P (x, y)→ A(x) P rng A : P (x, y)→ A(y) P inv Q : P (x, y)→ Q(y, x) A dw B : A(x) ∧ B(x)→ ⊥ P pdw Q : P (x, y) ∧ Q(x, y)→ ⊥ func : P (x, y) ∧ P (x, z) ∧ y 6= z → ⊥ Fig. 1. Rule representation of DBP. we use an auxiliary surrogate key I to horizontally partition the single key-value store Wd. Our schema W assumes key constraints UT → I, IP → V and the inclusion dependency Wd[I] ⊆ Wi[I]. Two kinds of values are allowed in W: labelled nulls and constants, whereby only constants will be transferred to the DBpedia by the mappings as explained below. Mapping constraints Σ. The specification [1] distinguishes several types of DBpedia mappings summarized in Table 1 along with their figures in the English DBpedia. All these mappings can be represented as nested tgds [5, 7] extended with negation and constraints in the antecedents for capturing the conditional mappings and interpreted functions in the conclusions of implications, in the case of calculated mappings handling, e.g., dates or geo coordinates. A crucial limitation of the mapping language (which we call DBpedia tgds) is the impossibility of comparisons between infobox property values. Infobox type Wi.T and property names Wd.P must be specified explicitly. For a Wiki instance I , byM(I) we denote the chase of I with the tgds inM [7] and byM◦ T (I) the closure ofM(I) under the rules in Fig. 1. Example 1. A tgd formalizing a French DBpedia mapping for clergy: ∀U∀I ( Wi(U, ’fr:Prélat catholique’, I) → ( Wd(I, ’titre’, ’Pape’) →∃Y ( Pope(U) ∧ occupation(U, Y ) ∧ PersonFunction(Y ) ∧ title(Y, ’Pape’)) // “Intermediate node mapping” ∧ ... ∧ ∀X(Wd(I, ’prédécesseur pape’, X) → predecessor(Y,X)) ) ... ∧ (Wd(I, ’titre’, ’Prêtre’) → Priest(U)) ∧ (¬Wd(I, ’titre’, ’Pape’) ∧ . . . ∧ ¬Wd(I, ’titre’, ’Prêtre’) → Cleric(U)) // “otherwise” ∧ ∀X(Wd(I, ’nom’, X) → foaf:name(U,X)) ... ∧ ∀X(Wd(I, ’nom naissance’, X) → birthName(U,X)) )) The specification stipulates that conditions are evaluated in the natural order, and thus every next condition has to include the negation of all preceding conditions. In our case, this is only illustrated by the last, default (“otherwise”) case, since the conditions are mutually exclusive. Note also that no universally quantified variable besides the page URI U and the technical infobox identifier I) – i.e., no variable representing an infobox property, called X in the example – can occur in more than two Wd atoms. One further particularity of the chase with tgds is handling of existentially quantified variables. A usual approach is to instantiate such variables by null values, which could be blank nodes in the case of RDF. The strategy followed by DBpedia is however different: instead of blank nodes, the chase produces fresh IRIs, avoiding clashes with existing page URIs. Already the following problem is worst-case intractable for WDFs: ABox source consistency ASCONS [2, 6].Parameter: WDF (M, T ). Input: ABox A. Test if A ∪ T 6|= ⊥ and if a Wiki instance I exists such thatM◦ T (I) = A. TYPE OF MAPPINGS DECLARED DESCRIPTION Template 958 Map Wiki templates to DBpedia classes. Property 19,972 Map Wiki template properties to DBpedia properties. IntermediateNode 107 Generate a blank node with a URI. Conditional 31 Depend on template properties and their values. Calculate 23 Compute a function over two properties. Date 106 Mappings that generate a starting and ending date. Table 1. Description of DBpedia (English) mappings. Proposition 1. ASCONS is NP-complete.1 2 Towards the DBpedia OBDM The ABox source consistency problem demonstrates one source of complexity for DBpedia update translations, namely accommodating a set of insertions exactly (up to the facts derivable via a TBox). Definition 1 (Translation of an infobox update). Let I be a Wiki instance, e = (e−, e) be an infobox update and letM be a DBpedia mapping. The translationMI(e) of e w.r.t.M and I is a DBpedia update u = (u−, u) where u− =M◦T (I)\M◦T (e(I)) and u =M◦ T (e(I)) \M ◦ T (I). The inverse translation, casting a DBpedia update as a Wiki update, can be defined similarly, with the difference that such a translation is often not unique or even not existing, for various reasons: (i) many-to-many relations between Wiki and RDF properties: modifying just a single fact can be impossible (ii) updates can cause inconsistencies as directly w.r.t. the previous DBpedia knowledge, as also indirectly, by triggering a conditional mapping rule, causing already existing infobox properties to be transfered to DBpedia, resulting in a clash. Therefore, we define translations based on containment. Definition 2 (Update containment). The syntactic containment u1 ⊆ u2 holds when u1 ⊆ u + 2 and u − 1 ⊆ u − 2 is the case. This containment is applicable to pairs of Wiki updates. Given an instance I of a WDF (M, T ) the WDF containment u ⊆I e between the Wiki update e and the DBpedia update u holds if u ⊆ MI(e). The proper update containment relations ⊂ and ⊂I are defined analogously. For the heterogeneous pair u, e of updates as above, we say that e minimally contains u, written u ⊆Imin e, if (i) e 6|=I ⊥ and (ii) for every Wiki update e′ with e′ ⊂ e, u 6⊆I e′ or e′ |=I ⊥ is the case; if e′ ⊂ e implies u 6⊆I e′ (that is, the option e′ |=I ⊥ is eliminated), e is said to faithfully contain u, written u ⊆Ifth e. We also use u ⊆Iex e and u =I e as shorthands for (u ⊆MI(e)) ∧ (MI(e) ⊆ u). Intuitively, minimal containment ensures that all insertions and deletions performed by e are necessary either to implement u or to restore the ABox consistency after implementing u. In contrast, faithful containment deprecates extending u purely for the sake of restoring the consistency. The notions of minimal and faithful adapt the semantics considered in [4] in a much simpler setting of SPARQL ABox udpates, where no mappings have been present. Using the above definition, the decision version of the OBDM [9] problem can be defined as follows: Source revision SREV for the WDF (M, T ) and ∈ {min, fth, ex}. Input: WDF instance I , DBpedia update u, Wiki update e. Test if u ⊆I e holds. 1 See [3] for a proof sketch. The source revision problem is a special case of belief revision problem tailored to the OBDM setting, in which the mapping and the TBox are considered fixed and the ABox is derived: that is, only the infobox data can be actually modified. 3 Discussion and Practical Outlook OBDM related problems tend to be intractable w.r.t. the worst case complexity even for simple mapping and ontology languages, such as those underlying DBpedia. Our initial experiments with the translation of SPARQL updates in this setting (discussed in [3]) demonstrate however, that worst-case scenarios leading to intractability of update handling are seldom realized in the current DBpedia version. From a practical point of view, the following considerations appear crucial. First, it is the inherent ambiguity of update translation; m
[1]
Serge Abiteboul,et al.
Complexity of answering queries using materialized views
,
1998,
PODS.
[2]
Paolo Papotti,et al.
Nested mappings: schema mapping reloaded
,
2006,
VLDB.
[3]
Jens Lehmann,et al.
DBpedia - A large-scale, multilingual knowledge base extracted from Wikipedia
,
2015,
Semantic Web.
[4]
Axel Polleres,et al.
Updating Wikipedia via DBpedia Mappings and SPARQL
,
2017,
ESWC.
[5]
Ali Moallemi,et al.
Recovering Exchanged Data
,
2015,
PODS.
[6]
Emanuel Sallinger,et al.
Nested dependencies: structure and reasoning
,
2014,
PODS.
[7]
Diego Calvanese,et al.
Handling Inconsistencies Due to Class Disjointness in SPARQL Updates
,
2016,
ESWC.
[8]
Maurizio Lenzerini.
Ontology-based data management
,
2011,
CIKM '11.