Compact explanation of data fusion decisions

Despite the abundance of useful information on the Web, different Web sources often provide conflicting data, some being out-of-date, inaccurate, or erroneous. Data fusion aims at resolving conflicts and finding the truth. Advanced fusion techniques apply iterative MAP (Maximum A Posteriori) analysis that reasons about trustworthiness of sources and copying relationships between them. Providing explanations for such decisions is important for a better understanding, but can be extremely challenging because of the complexity of the analysis during decision making. This paper proposes two types of explanations for data-fusion results: snapshot explanations take the provided data and any other decision inferred from the data as evidence and provide a high-level understanding of a fusion decision; comprehensive explanations take only the data as evidence and provide an in-depth understanding of a fusion decision. We propose techniques that can efficiently generate correct and compact explanations. Experimental results show that (1) we generate correct explanations, (2) our techniques can significantly reduce the sizes of the explanations, and (3) we can generate the explanations efficiently.

[1]  References , 1971 .

[2]  Marek J. Druzdzel Qualitative Verbal Explanations in Bayesian Belief Networks , 1996 .

[3]  Carmen Lacave,et al.  Graphical Explanation in Bayesian Networks , 2000, ISMDA.

[4]  Luis Gravano,et al.  Snowball: extracting relations from large plain-text collections , 2000, DL '00.

[5]  Philip S. Yu,et al.  Truth Discovery with Multiple Conflicting Information Providers on the Web , 2007, IEEE Transactions on Knowledge and Data Engineering.

[6]  Juliana Freire,et al.  Provenance and scientific workflows: challenges and opportunities , 2008, SIGMOD Conference.

[7]  James Cheney,et al.  Curated databases , 2008, PODS.

[8]  Jeffrey F. Naughton,et al.  On the provenance of non-answers to queries over extracted data , 2008, Proc. VLDB Endow..

[9]  Divesh Srivastava,et al.  Truth Discovery and Copying Detection in a Dynamic World , 2009, Proc. VLDB Endow..

[10]  Divesh Srivastava,et al.  Integrating Conflicting Data: The Role of Source Dependence , 2009, Proc. VLDB Endow..

[11]  Felix Naumann,et al.  Data fusion , 2009, CSUR.

[12]  Min Wang,et al.  Provenance query evaluation: what's so special about it? , 2009, CIKM.

[13]  Adriane Chapman,et al.  Why Not? , 1965, SIGMOD Conference.

[14]  Melanie Herschel,et al.  Explaining missing answers to SPJUA queries , 2010, Proc. VLDB Endow..

[15]  Dan Suciu,et al.  The Complexity of Causality and Responsibility for Query Answers and non-Answers , 2010, Proc. VLDB Endow..

[16]  Gustavo Alonso,et al.  TRAMP: Understanding the Behavior of Schema Mappings through Provenance , 2010, Proc. VLDB Endow..

[17]  Dan Roth,et al.  Knowing What to Believe (when you already know something) , 2010, COLING.

[18]  Lorenzo Blanco,et al.  Probabilistic Models to Reconcile Complex Data from Inaccurate Data Sources , 2010, CAiSE.

[19]  Divesh Srivastava,et al.  I4E: interactive investigation of iterative information extraction , 2010, SIGMOD Conference.

[20]  Serge Abiteboul,et al.  Corroborating information from disagreeing views , 2010, WSDM '10.

[21]  Divesh Srivastava,et al.  Global detection of complex copying relationships between sources , 2010, Proc. VLDB Endow..

[22]  Amélie Marian,et al.  A framework for corroborating answers from multiple web sources , 2011, Inf. Syst..

[23]  Divesh Srivastava,et al.  SOLOMON , 2010, Proc. VLDB Endow..

[24]  Dan Roth,et al.  Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence Making Better Informed Trust Decisions with Generalized Fact-Finding , 2022 .

[25]  Dan Roth,et al.  Generalized fact-finding , 2011, WWW.

[26]  Xiaoxin Yin,et al.  Semi-supervised truth discovery , 2011, WWW.

[27]  Jiawei Han,et al.  A Probabilistic Model for Estimating Real-valued Truth from Conflicting Sources , 2012 .

[28]  Divesh Srivastava,et al.  Truth Finding on the Deep Web: Is the Problem Solved? , 2012, Proc. VLDB Endow..

[29]  Bo Zhao,et al.  A Bayesian Approach to Discovering Truth from Conflicting Sources for Data Integration , 2012, Proc. VLDB Endow..