Spread2RML: Constructing Knowledge Graphs by Predicting RML Mappings on Messy Spreadsheets

The RDF Mapping Language (RML) allows to map semi-structured data to RDF knowledge graphs. Besides CSV, JSON and XML, this also includes the mapping of spreadsheet tables. Since spreadsheets have a complex data model and can become rather messy, their mapping creation tends to be very time consuming. In order to reduce such efforts, this paper presents Spread2RML which predicts RML mappings on messy spreadsheets. This is done with an extensible set of RML object map templates which are applied for each column based on heuristics. In our evaluation, three datasets are used ranging from very messy synthetic data to spreadsheets from data.gov which are less messy. We obtained first promising results especially with regard to our approach being fully automatic and dealing with rather messy data.

[1]  Ruben Verborgh,et al.  Semi-Automatic Example-Driven Linked Data Mapping Creation , 2017, LD4IE@ISWC.

[2]  Ruben Verborgh,et al.  An Ontology to Semantically Declare and Describe Functions , 2016, ESWC.

[3]  Ziqi Zhang,et al.  Effective and efficient Semantic Table Interpretation using TableMiner+ , 2017, Semantic Web.

[4]  Andreas Dengel,et al.  Mapping Spreadsheets to RDF: Supporting Excel in RML , 2021, ArXiv.

[5]  Jens Lehmann,et al.  DBpedia: A Nucleus for a Web of Open Data , 2007, ISWC/ASWC.

[6]  Sören Auer,et al.  User-driven semantic mapping of tabular data , 2013, I-SEMANTICS '13.

[7]  Freddy Priyatna,et al.  MIRROR: Automatic R2RML Mapping Generation from Relational Databases , 2015, ICWE.

[8]  Stephen A. Cook,et al.  The complexity of theorem-proving procedures , 1971, STOC.

[9]  Giorgio Valentini,et al.  Semi-automatic Column Type Inference for CSV Table Understanding , 2021, SOFSEM.

[10]  Christopher K. I. Williams,et al.  ptype: probabilistic type inference , 2020, Data Mining and Knowledge Discovery.

[11]  Rik Van de Walle,et al.  RML: A Generic Language for Integrated RDF Mappings of Heterogeneous Data , 2014, LDOW.

[12]  Lifting Tabular Data to RDF: A Survey , 2021, MTSR.

[13]  Karl Hammar Linked Data Creation with ExcelRDF , 2020, ESWC.

[14]  Elena Casiraghi,et al.  Table understanding approaches for extracting knowledge from heterogeneous tables , 2021, WIREs Data Mining Knowl. Discov..

[15]  Ruben Verborgh,et al.  Declarative Rules for Linked Data Generation at Your Fingertips! , 2018, ESWC.

[16]  David Eppstein,et al.  The Polyhedral Approach to the Maximum Planar Subgraph Problem: New Chances for Related Problems , 1994, GD.

[17]  Gerhard Weikum,et al.  WWW 2007 / Track: Semantic Web Session: Ontologies ABSTRACT YAGO: A Core of Semantic Knowledge , 2022 .

[18]  Karl Aberer The Semantic Web, 6th International Semantic Web Conference, 2nd Asian Semantic Web Conference, ISWC 2007 + ASWC 2007, Busan, Korea, November 11-15, 2007 , 2007, ISWC/ASWC.

[19]  Rik Van de Walle,et al.  RMLEditor: A Graph-Based Mapping Editor for Linked Data Mappings , 2016, ESWC.

[20]  Andreas Dengel,et al.  Dataset Generation Patterns for Evaluating Knowledge Graph Construction , 2021, ESWC.

[21]  Denny Vrandecic,et al.  Wikidata: a new platform for collaborative data collection , 2012, WWW.