SmartVote: a full-fledged graph-based model for multi-valued truth discovery

In the era of Big Data, truth discovery has emerged as a fundamental research topic, which estimates data veracity by determining the reliability of multiple, often conflicting data sources. Although considerable research efforts have been conducted on this topic, most current approaches assume only one true value for each object. In reality, objects with multiple true values widely exist and the existing approaches that cope with multi-valued objects still lack accuracy. In this paper, we propose a full-fledged graph-based model, SmartVote, which models two types of source relations with additional quantification to precisely estimate source reliability for effective multi-valued truth discovery. Two graphs are constructed and further used to derive different aspects of source reliability (i.e., positive precision and negative precision) via random walk computations. Our model incorporates four important implications, including two types of source relations, object popularity, loose mutual exclusion, and long-tail phenomenon on source coverage, to pursue better accuracy in truth discovery. Empirical studies on two large real-world datasets demonstrate the effectiveness of our approach.

[1]  Bo Zhao,et al.  A Confidence-Aware Approach for Truth Discovery on Long-Tail Data , 2014, Proc. VLDB Endow..

[2]  Dan Roth,et al.  Knowing What to Believe (when you already know something) , 2010, COLING.

[3]  Shuai Ma,et al.  Data Quality Problems beyond Consistency and Deduplication , 2013, In Search of Elegance in the Theory and Practice of Computation.

[4]  Lu Su,et al.  A Truth Discovery Approach with Theoretical Guarantee , 2016, KDD.

[5]  Jiawei Han,et al.  A Probabilistic Model for Estimating Real-valued Truth from Conflicting Sources , 2012 .

[6]  Anne H. H. Ngu,et al.  Value Veracity Estimation for Multi-Truth Objects via a Graph-Based Approach , 2017, WWW.

[7]  Divesh Srivastava,et al.  Integrating Conflicting Data: The Role of Source Dependence , 2009, Proc. VLDB Endow..

[8]  Laure Berti-Équille,et al.  Truth Discovery Algorithms: An Experimental Evaluation , 2014, ArXiv.

[9]  Taylor Cassidy,et al.  The Wisdom of Minority: Unsupervised Slot Filling Validation based on Multi-dimensional Truth-Finding , 2014, COLING.

[10]  Quan Z. Sheng,et al.  The Uncertain Web , 2015, ACM Trans. Internet Techn..

[11]  Bo Zhao,et al.  A Bayesian Approach to Discovering Truth from Conflicting Sources for Data Integration , 2012, Proc. VLDB Endow..

[12]  Xiaoxin Yin,et al.  Semi-supervised truth discovery , 2011, WWW.

[13]  David F. Gleich,et al.  Tracking the random surfer: empirically measured teleportation parameters in PageRank , 2010, WWW '10.

[14]  Heng Ji,et al.  Modeling Truth Existence in Truth Discovery , 2015, KDD.

[15]  Lina Yao,et al.  Truth Discovery via Exploiting Implications from Multi-Source Data , 2016, CIKM.

[16]  Fenglong Ma,et al.  Towards Confidence in the Truth: A Bootstrapping based Truth Discovery Approach , 2016, KDD.

[17]  Xiu Susie Fang,et al.  Generating Actionable Knowledge from Big Data , 2015, SIGMOD PhD Symposium.

[18]  Quan Z. Sheng,et al.  Ontology Augmentation via Attribute Extraction from Multiple Types of Sources , 2015, ADC.

[19]  K. Gwet Handbook of Inter-Rater Reliability: The Definitive Guide to Measuring the Extent of Agreement Among Raters , 2014 .

[20]  Lina Yao,et al.  An Integrated Bayesian Approach for Effective Multi-Truth Discovery , 2015, CIKM.

[21]  Felix Naumann,et al.  Data fusion , 2009, CSUR.

[22]  Bo Zhao,et al.  A Survey on Truth Discovery , 2015, SKDD.

[23]  Divesh Srivastava,et al.  Truth Discovery and Copying Detection in a Dynamic World , 2009, Proc. VLDB Endow..

[24]  Felix Naumann,et al.  Data Fusion – Resolving Data Conflicts for Integration , 2009 .

[25]  Bo Zhao,et al.  Truth Discovery and Crowdsourcing Aggregation: A Unified Perspective , 2015, Proc. VLDB Endow..

[26]  Divesh Srivastava,et al.  Global detection of complex copying relationships between sources , 2010, Proc. VLDB Endow..

[27]  Bo Zhao,et al.  Resolving conflicts in heterogeneous data by truth discovery and source reliability estimation , 2014, SIGMOD Conference.

[28]  Wenfei Fan,et al.  Data Quality: Theory and Practice , 2012, WAIM.

[29]  Divesh Srivastava,et al.  Fusing data with correlations , 2014, SIGMOD Conference.

[30]  Beng Chin Ooi,et al.  Online data fusion , 2011, Proc. VLDB Endow..

[31]  Gerhard Weikum,et al.  People on drugs: credibility of user statements in health communities , 2014, KDD.

[32]  Lina Yao,et al.  Empowering Truth Discovery with Multi-Truth Prediction , 2016, CIKM.

[33]  Aristides Gionis,et al.  Event detection in activity networks , 2014, KDD.

[34]  Divesh Srivastava,et al.  Scaling up copy detection , 2015, 2015 IEEE 31st International Conference on Data Engineering.

[35]  Felix Naumann,et al.  Conflict Handling Strategies in an Integrated Information System , 2006 .

[36]  Divesh Srivastava,et al.  Truth Finding on the Deep Web: Is the Problem Solved? , 2012, Proc. VLDB Endow..

[37]  Divesh Srivastava,et al.  Less is More: Selecting Sources Wisely for Integration , 2012, Proc. VLDB Endow..

[38]  Wei Zhang,et al.  Knowledge vault: a web-scale approach to probabilistic knowledge fusion , 2014, KDD.

[39]  Serge Abiteboul,et al.  Corroborating information from disagreeing views , 2010, WSDM '10.

[40]  Felix Naumann,et al.  Data Fusion in Three Steps: Resolving Schema, Tuple, and Value Inconsistencies , 2006, IEEE Data Eng. Bull..

[41]  Philip S. Yu,et al.  Truth Discovery with Multiple Conflicting Information Providers on the Web , 2007, IEEE Transactions on Knowledge and Data Engineering.

[42]  Fenglong Ma,et al.  Influence-Aware Truth Discovery , 2016, CIKM.

[43]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[44]  Gerhard Weikum,et al.  Where the Truth Lies: Explaining the Credibility of Emerging Claims on the Web and Social Media , 2017, WWW.

[45]  Bo Zhao,et al.  From Truth Discovery to Trustworthy Opinion Discovery: An Uncertainty-Aware Quantitative Modeling Approach , 2016, KDD.

[46]  Wei Zhang,et al.  From Data Fusion to Knowledge Fusion , 2014, Proc. VLDB Endow..