Data lineage in the MOMIS data fusion system

Data Lineage is an open research problem. This is particularly true in data integration systems, where information coming from different sources, potentially uncertain or even inconsistent with each other, is integrated. In this context, having the possibility to trace the lineage of certain data can help unraveling possible unexpected or questionable results. In this paper, we describe our preliminary work about this problem in the context of the MOMIS data Integration system. We discuss and compare the use of Lineage-CS and PI-CS provenance, introduced respectively in [1] and [2], for the data fusion operator used in the MOMIS system; in particular we evaluate how the computation of the PI-CS provenance should be extended to deal with Resolution Functions used in our data fusion system.

[1]  Gustavo Alonso,et al.  TRAMP: Understanding the Behavior of Schema Mappings through Provenance , 2010, Proc. VLDB Endow..

[2]  Felix Naumann,et al.  Data Fusion – Resolving Data Conflicts for Integration , 2009 .

[3]  Felix Naumann,et al.  Conflict Handling Strategies in an Integrated Information System , 2006 .

[4]  Gustavo Alonso,et al.  Perm: Processing Provenance and Data on the Same Data Model through Query Rewriting , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[5]  Boris Glavic Formal Foundation of Contribution Semantics and Provenance Computation through Query Rewrite in TRAMP , 2010 .

[6]  Chen Li,et al.  Information Integration Research: Summary of NSF IDM Workshop Breakout Session , 2004 .

[7]  Val Tannen,et al.  Provenance semirings , 2007, PODS.

[8]  Jennifer Widom,et al.  Lineage tracing for general data warehouse transformations , 2003, The VLDB Journal.

[9]  Maurizio Vincini,et al.  Synthesizing an Integrated Ontology , 2003, IEEE Internet Comput..

[10]  Jennifer Widom,et al.  Tracing the lineage of view data in a warehousing environment , 2000, TODS.

[11]  Silvana Castano,et al.  Semantic integration of heterogeneous information sources , 2001, Data Knowl. Eng..

[12]  Maurizio Lenzerini,et al.  Data integration: a theoretical perspective , 2002, PODS.

[13]  V. Vianu,et al.  Edinburgh Why and Where: A Characterization of Data Provenance , 2017 .

[14]  Setsuo Ohsuga,et al.  INTERNATIONAL CONFERENCE ON VERY LARGE DATA BASES , 1977 .

[15]  Felix Naumann,et al.  Data fusion , 2009, CSUR.

[16]  Joann J. Ordille,et al.  Data integration: the teenage years , 2006, VLDB.

[17]  MAJ Paul B. Lester,et al.  Data Integration , 2014, Encyclopedia of Social Network Analysis and Mining.

[18]  G. Höfner,et al.  Data integration , 1993 .

[19]  Felix Naumann,et al.  Completeness of integrated information sources , 2004, Inf. Syst..