FaitCrowd: Fine Grained Truth Discovery for Crowdsourced Data Aggregation

In crowdsourced data aggregation task, there exist conflicts in the answers provided by large numbers of sources on the same set of questions. The most important challenge for this task is to estimate source reliability and select answers that are provided by high-quality sources. Existing work solves this problem by simultaneously estimating sources' reliability and inferring questions' true answers (i.e., the truths). However, these methods assume that a source has the same reliability degree on all the questions, but ignore the fact that sources' reliability may vary significantly among different topics. To capture various expertise levels on different topics, we propose FaitCrowd, a fine grained truth discovery model for the task of aggregating conflicting data collected from multiple users/sources. FaitCrowd jointly models the process of generating question content and sources' provided answers in a probabilistic model to estimate both topical expertise and true answers simultaneously. This leads to a more precise estimation of source reliability. Therefore, FaitCrowd demonstrates better ability to obtain true answers for the questions compared with existing approaches. Experimental results on two real-world datasets show that FaitCrowd can significantly reduce the error rate of aggregation compared with the state-of-the-art multi-source aggregation approaches due to its ability of learning topical expertise from question content and collected answers.

[1]  Huiping Sun,et al.  CQArank: jointly model topics and expertise in community question answering , 2013, CIKM.

[2]  Brendan T. O'Connor,et al.  Cheap and Fast – But is it Good? Evaluating Non-Expert Annotations for Natural Language Tasks , 2008, EMNLP.

[3]  Heng Ji,et al.  Overview of the TAC 2010 Knowledge Base Population Track , 2010 .

[4]  Murat Demirbas,et al.  Crowdsourcing for Multiple-Choice Question Answering , 2014, AAAI.

[5]  Divesh Srivastava,et al.  Integrating Conflicting Data: The Role of Source Dependence , 2009, Proc. VLDB Endow..

[6]  Bo Zhao,et al.  A Bayesian Approach to Discovering Truth from Conflicting Sources for Data Integration , 2012, Proc. VLDB Endow..

[7]  Jiawei Han,et al.  A Probabilistic Model for Estimating Real-valued Truth from Conflicting Sources , 2012 .

[8]  Chunping Li,et al.  Topic-Level Expert Modeling in Community Question Answering , 2013, SDM.

[9]  Divesh Srivastava,et al.  Less is More: Selecting Sources Wisely for Integration , 2012, Proc. VLDB Endow..

[10]  Serge Abiteboul,et al.  Corroborating information from disagreeing views , 2010, WSDM '10.

[11]  Dan Roth,et al.  Latent credibility analysis , 2013, WWW.

[12]  Bo Zhao,et al.  Resolving conflicts in heterogeneous data by truth discovery and source reliability estimation , 2014, SIGMOD Conference.

[13]  Dan Roth,et al.  Knowing What to Believe (when you already know something) , 2010, COLING.

[14]  Bo Zhao,et al.  A Survey on Truth Discovery , 2015, SKDD.

[15]  Gerhard Weikum,et al.  People on drugs: credibility of user statements in health communities , 2014, KDD.

[16]  Bo Zhao,et al.  A Confidence-Aware Approach for Truth Discovery on Long-Tail Data , 2014, Proc. VLDB Endow..

[17]  John C. Platt,et al.  Learning from the Wisdom of Crowds by Minimax Entropy , 2012, NIPS.

[18]  A. P. Dawid,et al.  Maximum Likelihood Estimation of Observer Error‐Rates Using the EM Algorithm , 1979 .

[19]  Lance Kaplan,et al.  On truth discovery in social sensing: A maximum likelihood estimation approach , 2012, 2012 ACM/IEEE 11th International Conference on Information Processing in Sensor Networks (IPSN).

[20]  Dan Roth,et al.  Content-driven trust propagation framework , 2011, KDD.

[21]  Xue Liu,et al.  Generalized Decision Aggregation in Distributed Sensing Systems , 2014, 2014 IEEE Real-Time Systems Symposium.

[22]  Javier R. Movellan,et al.  Whose Vote Should Count More: Optimal Integration of Labels from Labelers of Unknown Expertise , 2009, NIPS.

[23]  Pietro Perona,et al.  The Multidimensional Wisdom of Crowds , 2010, NIPS.

[24]  Gerardo Hermosillo,et al.  Learning From Crowds , 2010, J. Mach. Learn. Res..

[25]  Philip S. Yu,et al.  Truth Discovery with Multiple Conflicting Information Providers on the Web , 2007, IEEE Transactions on Knowledge and Data Engineering.

[26]  Yong Yu,et al.  Tapping on the potential of q&a community by recommending answer providers , 2008, CIKM '08.

[27]  Hongfei Yan,et al.  Comparing Twitter and Traditional Media Using Topic Models , 2011, ECIR.

[28]  Milad Shokouhi,et al.  Community-based bayesian aggregation models for crowdsourcing , 2014, WWW.

[29]  Gianluca Demartini,et al.  ZenCrowd: leveraging probabilistic reasoning and crowdsourcing techniques for large-scale entity linking , 2012, WWW.

[30]  Mark Steyvers,et al.  Finding scientific topics , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[31]  Hanna M. Wallach,et al.  Topic modeling: beyond bag-of-words , 2006, ICML.