Release Early, Release Often: Predicting Change in Versioned Knowledge Organization Systems on the Web

The Semantic Web is built on top of Knowledge Organization Systems (KOS) (vocabularies, ontologies, concept schemes) that provide a structured, interoperable and distributed access to Linked Data on the Web. The maintenance of these KOS over time has produced a number of KOS version chains: subsequent unique version identifiers to unique states of a KOS. However, the release of new KOS versions pose challenges to both KOS publishers and users. For publishers, updating a KOS is a knowledge intensive task that requires a lot of manual effort, often implying deep deliberation on the set of changes to introduce. For users that link their datasets to these KOS, a new version compromises the validity of their links, often creating ramifications. In this paper we describe a method to automatically detect which parts of a Web KOS are likely to change in a next version, using supervised learning on past versions in the KOS version chain. We use a set of ontology change features to model and predict change in arbitrary Web KOS. We apply our method on 139 varied datasets systematically retrieved from the Semantic Web, obtaining robust results at correctly predicting change. To illustrate the accuracy, genericity and domain independence of the method, we study the relationship between its effectiveness and several characterizations of the evaluated datasets, finding that predictors like the number of versions in a chain and their release frequency have a fundamental impact in predictability of change in Web KOS. Consequently, we argue for adopting a release early, release often philosophy in Web KOS development cycles.

[1]  Paul Lambert,et al.  Testing the universality of historical occupational stratification structures across time and space , 2006 .

[2]  Boris Motik,et al.  Managing multiple and distributed ontologies on the Semantic Web , 2003, The VLDB Journal.

[3]  João Gama,et al.  Learning with Drift Detection , 2004, SBIA.

[4]  Alexey Tsymbal,et al.  The problem of concept drift: definitions and related work , 2004 .

[5]  Bijan Parsia,et al.  Analysing Multiple Versions of an Ontology: A Study of the NCI Thesaurus , 2011, Description Logics.

[6]  Nicola Fanizzi,et al.  Conceptual Clustering: Concept Formation, Drift and Novelty Detection , 2007 .

[7]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[8]  Albert Meroño-Peñuela Semantic Web for the Humanities , 2013, ESWC.

[9]  Michael Hausenblas,et al.  Describing linked datasets with the VoID vocabulary , 2011 .

[10]  Rinke Hoekstra,et al.  Detecting and Reporting Extensional Concept Drift in Statistical Linked Data , 2013, SemStats@ISWC.

[11]  Michel C. A. Klein,et al.  Change Management for Distributed Ontologies , 2004 .

[12]  Michel C. A. Klein,et al.  What Is Concept Drift and How to Measure It? , 2010, EKAW.

[13]  Catia Pesquita,et al.  Predicting the Extension of Biomedical Ontologies , 2012, PLoS Comput. Biol..

[14]  Ljiljana Stojanovic,et al.  Methods and tools for ontology evolution , 2004 .

[15]  Larry A. Rendell,et al.  The Feature Selection Problem: Traditional Methods and a New Algorithm , 1992, AAAI.

[16]  Jens Lehmann,et al.  DBpedia - A large-scale, multilingual knowledge base extracted from Wikipedia , 2015, Semantic Web.

[17]  Grigoris Antoniou,et al.  Ontology change: classification and survey , 2008, The Knowledge Engineering Review.

[18]  Jon Atle Gulla,et al.  Semantic Drift in Ontologies , 2010, WEBIST.