RODI: A Benchmark for Automatic Mapping Generation in Relational-to-Ontology Data Integration

A major challenge in information management today is the integration of huge amounts of data distributed across multiple data sources. A suggested approach to this problem is ontology-based data integration where legacy data systems are integrated via a common ontology that represents a unified global view over all data sources. However, data is often not natively born using these ontologies. Instead, much data resides in legacy relational databases. Therefore, mappings that relate the legacy relational data sources to the ontology need to be constructed. Recent techniques and systems that automatically construct such mappings have been developed. The quality metrics of these systems are, however, often only based on self-designed benchmarks. This paper introduces a new publicly available benchmarking suite called RODI, which is designed to cover a wide range of mapping challenges in Relational-to-Ontology Data Integration scenarios. RODI provides a set of different relational data sources and ontologies representing a wide range of mapping challenges as well as a scoring function with which the performance of relational-to-ontology mapping construction systems may be evaluated.

[1]  C. Batini,et al.  A comparative analysis of methodologies for database schema integration , 1986, CSUR.

[2]  Jennifer Widom,et al.  Database Systems: The Complete Book , 2001 .

[3]  Erhard Rahm,et al.  Schema and ontology matching with COMA++ , 2005, SIGMOD '05.

[4]  Stathes Hadjiefthymiades,et al.  RONTO: relational to ontology schema matching , 2006 .

[5]  Yuzhong Qu,et al.  Discovering Simple Mappings Between Relational Database Schemas and Ontologies , 2007, ISWC/ASWC.

[6]  Diego Calvanese,et al.  Linking Data to Ontologies , 2008, J. Data Semant..

[7]  S. Schulz,et al.  Survey of current terminologies and ontologies in biology and medicine , 2009 .

[8]  Eric Yu,et al.  Conceptual Modeling: Foundations and Applications , 2009 .

[9]  Laura M. Haas,et al.  Clio: Schema Mapping Creation and Data Exchange , 2009, Conceptual Modeling: Foundations and Applications.

[10]  Craig A. Knoblock,et al.  Karma: A System for Mapping Structured Sources into the Semantic Web , 2012, ESWC.

[11]  Kristina Lerman,et al.  Semi-automatically Mapping Structured Sources into the Semantic Web , 2012, ESWC.

[12]  Giorgos Stoilos,et al.  Benchmarking Ontology-Based Query Rewriting Systems , 2012, AAAI.

[13]  Óscar Corcho,et al.  Towards a Systematic Benchmarking of Ontology-Based Query Rewriting Systems , 2013, International Semantic Web Conference.

[14]  Ian Horrocks,et al.  Optique System: towards ontology and mapping management in OBDA solutions , 2013, WoDOOM.

[15]  Divesh Srivastava,et al.  Big data integration , 2013, 2013 IEEE 29th International Conference on Data Engineering (ICDE).

[16]  Ian Horrocks,et al.  Optique 1.0: Semantic Access to Big Data: The Case of Norwegian Petroleum Directorate's FactPages , 2013, International Semantic Web Conference.

[17]  Carsten Binnig,et al.  Pay as you go Matching of Relational Schemata to OWL Ontologies with IncMap , 2013, International Semantic Web Conference.

[18]  Carsten Binnig,et al.  IncMap: pay as you go matching of relational schemata to OWL ontologies , 2013, OM.

[19]  Wolfgang May,et al.  Experiences from a TBox Reasoning Application: Deriving a Relational Model by OWL Schema Analysis , 2013, OWLED.

[20]  Daniel P. Miranker,et al.  QODI: Query as Context in Automatic Data Integration , 2013, International Semantic Web Conference.

[21]  Ian Horrocks,et al.  Optique: Towards OBDA Systems for Industry , 2013, ESWC.

[22]  Ian Horrocks,et al.  What Are Ontologies Good For , 2013 .

[23]  Ian Horrocks,et al.  Publishing the Norwegian Petroleum Directorate's FactPages as Semantic Web Data , 2013, SEMWEB.

[24]  Tilmann Rabl,et al.  TPC-DI: The First Industry Benchmark for Data Integration , 2014, Proc. VLDB Endow..

[25]  Giovanna Guerrini,et al.  Detecting and Correcting Conservativity Principle Violations in Ontology-to-Ontology Mappings , 2014, SEMWEB.

[26]  Maurizio Lenzerini,et al.  Data Quality in Ontology-based Data Access: The Case of Consistency , 2014, AAAI.

[27]  Diego Calvanese,et al.  The NPD Benchmark for OBDA Systems , 2014, SSWS@ISWC.

[28]  Evgeny Kharlamov,et al.  How Semantic Technologies Can Enhance Data Access at Siemens Energy , 2014, SEMWEB.

[29]  Freddy Priyatna,et al.  Formalisation and experiences of R2RML-based SPARQL to SQL query translation using morph , 2014, WWW.

[30]  Carsten Binnig,et al.  How to Best Find a Partner? An Evaluation of Editing Approaches to Construct R2RML Mappings , 2014, ESWC.

[31]  Ernesto Jiménez-Ruiz,et al.  Optique – Zooming In on Big Data Access , 2014 .

[32]  Peter Haase,et al.  Optique: Zooming in on Big Data , 2015, Computer.

[33]  Aditya G. Parameswaran,et al.  DataHub: Collaborative Data Science & Dataset Version Management at Scale , 2014, CIDR.

[34]  Mariano Rodriguez-Muro,et al.  Efficient SPARQL-to-SQL with R2RML mappings , 2015, J. Web Semant..