Revealing the Conceptual Schemas of RDF Datasets - Extended Abstract

Subhi Issa, Pierre-Henri Paris, Fayçal Hamdi, Samira Si-said Cherfi CEDRIC Conservatoire National des Arts et Métiers 292 Rue Saint Martin, Paris, France {subhi.issa,faycal.hamdi,samira.cherfi}@cnam.fr,pierre-henri.paris@upmc.fr ABSTRACT. This paper is an extended abstract of our work published at CAISE’20. The full paper is available at https:/ /doi.org/10.1007/978-3-030-21290-2_20. RDF-based datasets, thanks to their semantic richness, variety and fine granularity, are increasingly used by both researchers and business communities. However, these datasets suffer a lack of completeness as the content evolves continuously and data contributors are loosely constrained by the vocabularies and schemes related to the data sources. In the context of the Web of Data and user-generated content, the conceptual schema is implicit. In fact, each data contributor has an implicit personal model that is not known by the other contributors. Consequently, revealing a meaningful conceptual schema is a challenging task that should take into account the data and the intended usage. In this paper, we propose a completeness-based approach for revealing conceptual schemas of RDF data. We combine quality evaluation and data mining approaches to find a conceptual schema for a dataset, this model meets user expectations regarding data completeness constraints. To achieve that, we propose LOD-CM; a web-based completeness demonstrator for linked datasets. This paper is an extended abstract of our work published at CAISE’20. The full paper is available at https:/ /doi.org/10.1007/978-3-030-21290-2_20. RDF-based datasets, thanks to their semantic richness, variety and fine granularity, are increasingly used by both researchers and business communities. However, these datasets suffer a lack of completeness as the content evolves continuously and data contributors are loosely constrained by the vocabularies and schemes related to the data sources. In the context of the Web of Data and user-generated content, the conceptual schema is implicit. In fact, each data contributor has an implicit personal model that is not known by the other contributors. Consequently, revealing a meaningful conceptual schema is a challenging task that should take into account the data and the intended usage. In this paper, we propose a completeness-based approach for revealing conceptual schemas of RDF data. We combine quality evaluation and data mining approaches to find a conceptual schema for a dataset, this model meets user expectations regarding data completeness constraints. To achieve that, we propose LOD-CM; a web-based completeness demonstrator for linked datasets. RÉSUMÉ. Grâce à leur richesse sémantique, leur variété et leur granularité fine, les jeux de données fondés sur RDF sont de plus en plus utilisés par les chercheurs et les organisations. Cependant, ces jeux de données souffrent d’un manque de complétude en raison de l’évolution continue du contenu et le fait que les contributeurs ne sont pas tenus à respecter un vocabulaire et un schéma précis lors de la publication de leurs données. Dans cet article, nous proposons une approche fondée sur la complétude pour révéler les schémas conceptuels des données RDF. Nous combinons des approches d’évaluation de la qualité et de fouille de données pour trouver un schéma conceptuel pour un jeu de données, ce modèle répond aux attentes des utilisateurs en termes de complétude des données. Pour ce faire, nous proposons LOD-CM; un démonstrateur de complétude pour les jeux de données liés.