SmartMTD: A Graph-Based Approach for Effective Multi-Truth Discovery

The Big Data era features a huge amount of data that are contributed by numerous sources and used by many critical data-driven applications. Due to the varying reliability of sources, it is common to see conflicts among the multi-source data, making it difficult to determine which data sources to trust. Recently, truth discovery has emerged as a means of addressing this challenging issue by determining data veracity jointly with estimating the reliability of data sources. A fundamental issue with current truth discovery methods is that they generally assume only one true value for each object, while in reality, objects may have multiple true values. In this paper, we propose a graph-based approach, called SmartMTD, to unravel the truth discovery problem beyond the single-truth assumption, or the multi-truth discovery problem. SmartMTD models and quantifies two types of source relations to estimate source reliability precisely and to detect malicious agreement among sources for effective multi-truth discovery. In particular, two graphs are constructed based on the modeled source relations. They are further used to derive the two aspects of source reliability (i.e., positive precision and negative precision) via random walk computation. Empirical studies on two large real-world datasets demonstrate the effectiveness of our approach.

[1]  Gerhard Weikum,et al.  Where the Truth Lies: Explaining the Credibility of Emerging Claims on the Web and Social Media , 2017, WWW.

[2]  Bo Zhao,et al.  From Truth Discovery to Trustworthy Opinion Discovery: An Uncertainty-Aware Quantitative Modeling Approach , 2016, KDD.

[3]  Divesh Srivastava,et al.  Integrating Conflicting Data: The Role of Source Dependence , 2009, Proc. VLDB Endow..

[4]  Divesh Srivastava,et al.  Truth Finding on the Deep Web: Is the Problem Solved? , 2012, Proc. VLDB Endow..

[5]  Laure Berti-Équille,et al.  Truth Discovery Algorithms: An Experimental Evaluation , 2014, ArXiv.

[6]  Lina Yao,et al.  Empowering Truth Discovery with Multi-Truth Prediction , 2016, CIKM.

[7]  David F. Gleich,et al.  Tracking the random surfer: empirically measured teleportation parameters in PageRank , 2010, WWW '10.

[8]  Serge Abiteboul,et al.  Corroborating information from disagreeing views , 2010, WSDM '10.

[9]  Quan Z. Sheng,et al.  The Uncertain Web , 2015, ACM Trans. Internet Techn..

[10]  Bo Zhao,et al.  A Confidence-Aware Approach for Truth Discovery on Long-Tail Data , 2014, Proc. VLDB Endow..

[11]  Anne H. H. Ngu,et al.  Value Veracity Estimation for Multi-Truth Objects via a Graph-Based Approach , 2017, WWW.

[12]  Taylor Cassidy,et al.  The Wisdom of Minority: Unsupervised Slot Filling Validation based on Multi-dimensional Truth-Finding , 2014, COLING.

[13]  Bo Zhao,et al.  Truth Discovery and Crowdsourcing Aggregation: A Unified Perspective , 2015, Proc. VLDB Endow..

[14]  Jiawei Han,et al.  A Probabilistic Model for Estimating Real-valued Truth from Conflicting Sources , 2012 .

[15]  Heng Ji,et al.  Modeling Truth Existence in Truth Discovery , 2015, KDD.

[16]  Bo Zhao,et al.  A Bayesian Approach to Discovering Truth from Conflicting Sources for Data Integration , 2012, Proc. VLDB Endow..

[17]  Philip S. Yu,et al.  Truth Discovery with Multiple Conflicting Information Providers on the Web , 2007, IEEE Transactions on Knowledge and Data Engineering.

[18]  Lina Yao,et al.  Truth Discovery via Exploiting Implications from Multi-Source Data , 2016, CIKM.

[19]  Dan Roth,et al.  Knowing What to Believe (when you already know something) , 2010, COLING.

[20]  Bo Zhao,et al.  Resolving conflicts in heterogeneous data by truth discovery and source reliability estimation , 2014, SIGMOD Conference.

[21]  Divesh Srivastava,et al.  Fusing data with correlations , 2014, SIGMOD Conference.

[22]  Lina Yao,et al.  An Integrated Bayesian Approach for Effective Multi-Truth Discovery , 2015, CIKM.

[23]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.