An Integrated Bayesian Approach for Effective Multi-Truth Discovery

Truth-finding is the fundamental technique for corroborating reports from multiple sources in both data integration and collective intelligent applications. Traditional truth-finding methods assume a single true value for each data item and therefore cannot deal will multiple true values (i.e., the multi-truth-finding problem). So far, the existing approaches handle the multi-truth-finding problem in the same way as the single-truth-finding problems. Unfortunately, the multi-truth-finding problem has its unique features, such as the involvement of sets of values in claims, different implications of inter-value mutual exclusion, and larger source profiles. Considering these features could provide new opportunities for obtaining more accurate truth-finding results. Based on this insight, we propose an integrated Bayesian approach to the multi-truth-finding problem, by taking these features into account. To improve the truth-finding efficiency, we reformulate the multi-truth-finding problem model based on the mappings between sources and (sets of) values. New mutual exclusive relations are defined to reflect the possible co-existence of multiple true values. A finer-grained copy detection method is also proposed to deal with sources with large profiles. The experimental results on three real-world datasets show the effectiveness of our approach.

[1]  Dan Roth,et al.  Latent credibility analysis , 2013, WWW.

[2]  Divesh Srivastava,et al.  Big data integration , 2013, 2013 IEEE 29th International Conference on Data Engineering (ICDE).

[3]  Yee Whye Teh,et al.  Inferring ground truth from multi-annotator ordinal data: a probabilistic approach , 2013, ArXiv.

[4]  Bo Zhao,et al.  A Bayesian Approach to Discovering Truth from Conflicting Sources for Data Integration , 2012, Proc. VLDB Endow..

[5]  Dan Roth,et al.  Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence Making Better Informed Trust Decisions with Generalized Fact-Finding , 2022 .

[6]  Serge Abiteboul,et al.  Corroborating information from disagreeing views , 2010, WSDM '10.

[7]  Divesh Srivastava,et al.  Scaling up copy detection , 2015, 2015 IEEE 31st International Conference on Data Engineering.

[8]  Felix Naumann,et al.  Conflict Handling Strategies in an Integrated Information System , 2006 .

[9]  Ashwin Machanavajjhala,et al.  Information integration over time in unreliable and uncertain environments , 2012, WWW.

[10]  Dan Roth,et al.  Knowing What to Believe (when you already know something) , 2010, COLING.

[11]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[12]  Bo Zhao,et al.  A Confidence-Aware Approach for Truth Discovery on Long-Tail Data , 2014, Proc. VLDB Endow..

[13]  Jiawei Han,et al.  A Probabilistic Model for Estimating Real-valued Truth from Conflicting Sources , 2012 .

[14]  Philip S. Yu,et al.  Truth Discovery with Multiple Conflicting Information Providers on the Web , 2007, IEEE Transactions on Knowledge and Data Engineering.

[15]  Gjergji Kasneci,et al.  CoBayes: bayesian knowledge corroboration with assessors of unknown areas of expertise , 2011, WSDM '11.

[16]  Bo Zhao,et al.  Resolving conflicts in heterogeneous data by truth discovery and source reliability estimation , 2014, SIGMOD Conference.

[17]  Divesh Srivastava,et al.  Fusing data with correlations , 2014, SIGMOD Conference.

[18]  Dan Roth,et al.  Generalized fact-finding , 2011, WWW.

[19]  Beng Chin Ooi,et al.  Online data fusion , 2011, Proc. VLDB Endow..

[20]  Divesh Srivastava,et al.  Truth Finding on the Deep Web: Is the Problem Solved? , 2012, Proc. VLDB Endow..

[21]  Divesh Srivastava,et al.  Integrating Conflicting Data: The Role of Source Dependence , 2009, Proc. VLDB Endow..

[22]  Laure Berti-Équille,et al.  Truth Discovery Algorithms: An Experimental Evaluation , 2014, ArXiv.

[23]  Xiaoxin Yin,et al.  Semi-supervised truth discovery , 2011, WWW.

[24]  K. Gwet Handbook of Inter-Rater Reliability: The Definitive Guide to Measuring the Extent of Agreement Among Raters , 2014 .