Truth Discovery and Crowdsourcing Aggregation: A Unified Perspective

In the era of Big Data, data entries, even describing the same objects or events, can come from a variety of sources, where a data source can be a web page, a database or a person. Consequently, conflicts among sources become inevitable. To resolve the conflicts and achieve high quality data, truth discovery and crowdsourcing aggregation have been studied intensively. However, although these two topics have a lot in common, they are studied separately and are applied to different domains. To answer the need of a systematic introduction and comparison of the two topics, we present an organized picture on truth discovery and crowdsourcing aggregation in this tutorial. They are compared on both theory and application levels, and their related areas as well as open questions are discussed.

[1]  Gerhard Weikum,et al.  People on drugs: credibility of user statements in health communities , 2014, KDD.

[2]  David Chenho Kung,et al.  Information Systems Engineering , 1993, Springer Berlin Heidelberg.

[3]  Divesh Srivastava,et al.  Truth Finding on the Deep Web: Is the Problem Solved? , 2012, Proc. VLDB Endow..

[4]  Wilfred Ng,et al.  Truth Discovery in Data Streams: A Single-Pass Probabilistic Approach , 2014, CIKM.

[5]  Erhard Rahm,et al.  Data Cleaning: Problems and Current Approaches , 2000, IEEE Data Eng. Bull..

[6]  Yaron Singer,et al.  Pricing mechanisms for crowdsourcing markets , 2013, WWW.

[7]  Naman Goel,et al.  AllegatorTrack: Combining and reporting results of truth discovery from multi-source data , 2015, 2015 IEEE 31st International Conference on Data Engineering.

[8]  Javier R. Movellan,et al.  Whose Vote Should Count More: Optimal Integration of Labels from Labelers of Unknown Expertise , 2009, NIPS.

[9]  Gerardo Hermosillo,et al.  Learning From Crowds , 2010, J. Mach. Learn. Res..

[10]  Philip S. Yu,et al.  Truth Discovery with Multiple Conflicting Information Providers on the Web , 2007, IEEE Transactions on Knowledge and Data Engineering.

[11]  Bo Zhao,et al.  A Survey on Truth Discovery , 2015, SKDD.

[12]  Heng Ji,et al.  FaitCrowd: Fine Grained Truth Discovery for Crowdsourced Data Aggregation , 2015, KDD.

[13]  Divesh Srivastava,et al.  Truth Discovery and Copying Detection in a Dynamic World , 2009, Proc. VLDB Endow..

[14]  A. P. Dawid,et al.  Maximum Likelihood Estimation of Observer Error‐Rates Using the EM Algorithm , 1979 .

[15]  Jian Peng,et al.  Variational Inference for Crowdsourcing , 2012, NIPS.

[16]  Bo Zhao,et al.  A Bayesian Approach to Discovering Truth from Conflicting Sources for Data Integration , 2012, Proc. VLDB Endow..

[17]  Laura M. Haas,et al.  Beauty and the Beast: The Theory and Practice of Information Integration , 2007, ICDT.

[18]  Divesh Srivastava,et al.  Integrating Conflicting Data: The Role of Source Dependence , 2009, Proc. VLDB Endow..

[19]  Laure Berti-Équille,et al.  Truth Discovery Algorithms: An Experimental Evaluation , 2014, ArXiv.

[20]  Jiawei Han,et al.  Heterogeneous network-based trust analysis: a survey , 2011, SKDD.

[21]  Bo Zhao,et al.  Resolving conflicts in heterogeneous data by truth discovery and source reliability estimation , 2014, SIGMOD Conference.

[22]  Divesh Srivastava,et al.  Fusing data with correlations , 2014, SIGMOD Conference.

[23]  Jure Leskovec,et al.  A Bayesian Framework for Modeling Human Evaluations , 2015, SDM.

[24]  Pietro Perona,et al.  The Multidimensional Wisdom of Crowds , 2010, NIPS.

[25]  Xi Chen,et al.  Spectral Methods Meet EM: A Provably Optimal Algorithm for Crowdsourcing , 2014, J. Mach. Learn. Res..

[26]  Beng Chin Ooi,et al.  Online data fusion , 2011, Proc. VLDB Endow..

[27]  Karl Aberer,et al.  An Evaluation of Aggregation Techniques in Crowdsourcing , 2013, WISE.

[28]  John C. Platt,et al.  Learning from the Wisdom of Crowds by Minimax Entropy , 2012, NIPS.

[29]  Bo Zhao,et al.  The wisdom of minority: discovering and targeting the right group of workers for crowdsourcing , 2014, WWW.

[30]  Matthew Lease,et al.  SQUARE: A Benchmark for Research on Computing Crowd Consensus , 2013, HCOMP.

[31]  Bo Zhao,et al.  A Confidence-Aware Approach for Truth Discovery on Long-Tail Data , 2014, Proc. VLDB Endow..

[32]  Tom Minka,et al.  How To Grade a Test Without Knowing the Answers - A Bayesian Graphical Model for Adaptive Crowdsourcing and Aptitude Testing , 2012, ICML.

[33]  Xi Chen,et al.  Optimistic Knowledge Gradient Policy for Optimal Budget Allocation in Crowdsourcing , 2013, ICML.

[34]  Brendan T. O'Connor,et al.  Cheap and Fast – But is it Good? Evaluating Non-Expert Annotations for Natural Language Tasks , 2008, EMNLP.

[35]  Hongwei Li,et al.  Error Rate Analysis of Labeling by Crowdsourcing , 2013 .