Relaxing global-as-view in mediated data integration from linked data

In scenarios where many different, independent and dynamic data sources need to be brought together, mediated data integration at runtime is rapidly gaining interest. In a global-as-view approach, schema mappings express how to get data from each data source according to the global schema of the mediator. Key issues include the effort required to include and map new data sources, and the very need of data sources for the global schema to be expressed. It has been argued that the principles of Linked Data can be used to spread the cost of adding new sources in a pay-as-you-go model. We contribute by describing a data integration framework able to mitigate these issues, by relating data sources under a global schema which is implicit and only partly known at the time a new data source joins. Mappings over a data source only require partial knowledge of it and of the part of the global schema that it will affect. Pay-as-you go can then be employed to guarantee eventual schema compliance. This approach was adopted in a large-scale data integration system for Smart Cities, where it allowed short time-to-publish for new data and iterative schema refinements.

[1]  Enrico Motta,et al.  Dealing with Diversity in a Smart-City Datahub , 2014, S4SC@ISWC.

[2]  Norman W. Paton,et al.  Pay-as-you-go data integration for linked data: opportunities, challenges and architectures , 2012, SWIM '12.

[3]  Joann J. Ordille,et al.  Data integration: the teenage years , 2006, VLDB.

[4]  Steffen Staab,et al.  The Semantic Web - ISWC 2015 , 2015, Lecture Notes in Computer Science.

[5]  Ig Ibert Bittencourt,et al.  A systematic review on the use of best practices for publishing linked data , 2018, Online Inf. Rev..

[6]  Andrea Calì Query Answering by Rewriting in GLAV Data Integration Systems Under Constraints , 2004, SWDB.

[7]  Yannis Papakonstantinou,et al.  View-based Data Integration , 2009, Encyclopedia of Database Systems.

[8]  Enrico Motta,et al.  Addressing exploitability of Smart City data , 2016, 2016 IEEE International Smart Cities Conference (ISC2).

[9]  Carlo Curino,et al.  Automating the database schema evolution process , 2012, The VLDB Journal.

[10]  Alon Y. Halevy,et al.  Bootstrapping pay-as-you-go data integration systems , 2008, SIGMOD Conference.

[11]  Enrico Motta,et al.  Supporting virtual integration of Linked Data with just-in-time query recompilation , 2017, SEMANTICS.

[12]  Alexandra Poulovassilis,et al.  Peer-to-Peer Semantic Integration of Linked Data , 2015, EDBT/ICDT Workshops.

[13]  Adam Niewiadomski,et al.  Integration of Multiple Graph Datasets and Their Linguistic Summaries: An Application to Linked Data , 2016, ICAISC.

[14]  Nicolás Marín,et al.  Data Integration Using Lazy Types , 2006, ICHIT.

[15]  Alon Y. Halevy,et al.  Recursive Query Plans for Data Integration , 2000, J. Log. Program..

[16]  Evgenij Thorstensen,et al.  Mapping Analysis in Ontology-based Data Access: Algorithms and Complexity (Extended Abstract) , 2015, Description Logics.

[17]  Alberto Anguita,et al.  Enabling Cross Constraint Satisfaction in RDF-Based Heterogeneous Database Integration , 2008, 2008 20th IEEE International Conference on Tools with Artificial Intelligence.

[18]  Diego Calvanese,et al.  Ontology-Based Integration of Cross-Linked Datasets , 2015, SEMWEB.

[19]  Leopoldo E. Bertossi,et al.  Consistent Query Answers in Virtual Data Integration Systems , 2005, Inconsistency Tolerance.

[20]  Maurizio Lenzerini,et al.  Data integration: a theoretical perspective , 2002, PODS.

[21]  Kristina Lerman,et al.  Semi-automatically Mapping Structured Sources into the Semantic Web , 2012, ESWC.