Approximate Truth Discovery via Problem Scale Reduction

Many real-world applications rely on multiple data sources to provide information on their interested items. Due to the noises and uncertainty in data, given a specific item, the information from different sources may conflict. To make reliable decisions based on these data, it is important to identify the trustworthy information by resolving these conflicts, i.e., the truth discovery problem. Current solutions to this problem detect the veracity of each value jointly with the reliability of each source for each data item. In this way, the efficiency of truth discovery is strictly confined by the problem scale, which in turn limits truth discovery algorithms from being applicable on a large scale. To address this issue, we propose an approximate truth discovery approach, which divides sources and values into groups according to a user-specified approximation criterion. The groups are then used for efficient inter-value influence computation to improve the accuracy. Our approach is applicable to most existing truth discovery algorithms. Experiments on real-world datasets show that our approach improves the efficiency compared to existing algorithms while achieving similar or even better accuracy. The scalability is further demonstrated by experiments on large synthetic datasets.

[1]  Qinghua Wu,et al.  A review on algorithms for maximum clique problems , 2015, Eur. J. Oper. Res..

[2]  Jiawei Han,et al.  A Probabilistic Model for Estimating Real-valued Truth from Conflicting Sources , 2012 .

[3]  Felix Naumann,et al.  Conflict Handling Strategies in an Integrated Information System , 2006 .

[4]  Bo Zhao,et al.  The wisdom of minority: discovering and targeting the right group of workers for crowdsourcing , 2014, WWW.

[5]  Divesh Srivastava,et al.  Truth Finding on the Deep Web: Is the Problem Solved? , 2012, Proc. VLDB Endow..

[6]  Divesh Srivastava,et al.  Less is More: Selecting Sources Wisely for Integration , 2012, Proc. VLDB Endow..

[7]  Yizhou Sun,et al.  Trust analysis with clustering , 2011, WWW.

[8]  Felix Naumann,et al.  Data fusion , 2009, CSUR.

[9]  Subbarao Kambhampati,et al.  SourceRank: relevance and trust assessment for deep web sources based on inter-source agreement , 2011, WWW.

[10]  Bo Zhao,et al.  Resolving conflicts in heterogeneous data by truth discovery and source reliability estimation , 2014, SIGMOD Conference.

[11]  Divesh Srivastava,et al.  Integrating Conflicting Data: The Role of Source Dependence , 2009, Proc. VLDB Endow..

[12]  Laure Berti-Équille,et al.  Truth Discovery Algorithms: An Experimental Evaluation , 2014, ArXiv.

[13]  Serge Abiteboul,et al.  Corroborating information from disagreeing views , 2010, WSDM '10.

[14]  Dan Roth,et al.  Generalized fact-finding , 2011, WWW.

[15]  Dan Roth,et al.  Knowing What to Believe (when you already know something) , 2010, COLING.

[16]  Dan Roth,et al.  Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence Making Better Informed Trust Decisions with Generalized Fact-Finding , 2022 .

[17]  Felix Naumann,et al.  Data Fusion in Three Steps: Resolving Schema, Tuple, and Value Inconsistencies , 2006, IEEE Data Eng. Bull..

[18]  Xiaoxin Yin,et al.  Semi-supervised truth discovery , 2011, WWW.

[19]  Konstantin Andreev,et al.  Balanced Graph Partitioning , 2004, SPAA '04.

[20]  Dan Roth,et al.  Latent credibility analysis , 2013, WWW.

[21]  Bo Zhao,et al.  A Confidence-Aware Approach for Truth Discovery on Long-Tail Data , 2014, Proc. VLDB Endow..

[22]  Dan Roth,et al.  Content-driven trust propagation framework , 2011, KDD.

[23]  Philip S. Yu,et al.  Truth Discovery with Multiple Conflicting Information Providers on the Web , 2007, IEEE Transactions on Knowledge and Data Engineering.