Revealing the Conceptual Schemas of RDF Datasets

RDF-based datasets, thanks to their semantic richness, variety and fine granularity, are increasingly used by both researchers and business communities. However, these datasets suffer a lack of completeness as the content evolves continuously and data contributors are loosely constrained by the vocabularies and schemes related to the data sources. Conceptual schemas have long been recognized as a key mechanism for understanding and dealing with complex real-world systems. In the context of the Web of Data and user-generated content, the conceptual schema is implicit. In fact, each data contributor has an implicit personal model that is not known by the other contributors. Consequently, revealing a meaningful conceptual schema is a challenging task that should take into account the data and the intended usage. In this paper, we propose a completeness-based approach for revealing conceptual schemas of RDF data. We combine quality evaluation and data mining approaches to find a conceptual schema for a dataset, this model meets user expectations regarding data completeness constraints. To achieve that, we propose LOD-CM; a web-based completeness demonstrator for linked datasets.

[1]  David W. Embley,et al.  Big Data - Conceptual Modeling to the Rescue , 2013, ER.

[2]  Jian Pei,et al.  Mining Frequent Patterns without Candidate Generation: A Frequent-Pattern Tree Approach , 2006, Sixth IEEE International Conference on Data Mining - Workshops (ICDMW'06).

[3]  Mohamed F. Mokbel,et al.  RDF Data-Centric Storage , 2009, 2009 IEEE International Conference on Web Services.

[4]  Antoni Olivé,et al.  Conceptual modeling of information systems , 2007 .

[5]  Johanna Völker,et al.  Statistical Schema Induction , 2011, ESWC.

[6]  Roberto J. Bayardo,et al.  Efficiently mining long patterns from databases , 1998, SIGMOD '98.

[7]  Jian Pei,et al.  Mining frequent patterns without candidate generation , 2000, SIGMOD 2000.

[8]  Jens Lehmann,et al.  Quality Assessment Methodologies for Linked Open Data A Systematic Literature Review and Conceptual Framework , 2012 .

[9]  Gösta Grahne,et al.  Efficiently Using Prefix-trees in Mining Frequent Itemsets , 2003, FIMI.

[10]  Roman Lukyanenko,et al.  Representing instances: the case for reengineering conceptual modelling grammars , 2018, Eur. J. Inf. Syst..

[11]  Norman W. Paton,et al.  Structure inference for linked data sources using clustering , 2013, EDBT '13.

[12]  Roman Lukyanenko,et al.  Principles for Modeling User-Generated Content , 2015, ER.

[13]  Mohammed J. Zaki,et al.  Efficiently mining maximal frequent itemsets , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[14]  Peter A. Boncz,et al.  Deriving an Emergent Relational Schema from RDF Data , 2015, WWW.

[15]  Harald Sack,et al.  DBpedia ontology enrichment for inconsistency detection , 2012, I-SEMANTICS '12.

[16]  Colette Rolland,et al.  From conceptual modelling to requirements engineering , 2000, Ann. Softw. Eng..

[17]  Pierre-Henri Paris,et al.  Assessing the Completeness Evolution of DBpedia: A Case Study , 2017, ER Workshops.

[18]  Kenza Kellou-Menouer,et al.  Schema Discovery in RDF Data Sources , 2015, ER.

[19]  Carlo Batini,et al.  Erratum to: Data and Information Quality: Dimensions, Principles and Techniques , 2016 .

[20]  Xiaoyong Du,et al.  FlexTable: Using a Dynamic Relation Model to Store RDF Data , 2010, DASFAA.