Collective intelligence, which aggregates the shared information from large crowds, is often negatively impacted by unreliable information sources with the low quality data. This becomes a barrier to the effective use of collective intelligence in a variety of applications. In order to address this issue, we propose a probabilistic model to jointly assess the reliability of sources and find the true data. We observe that different sources are often not independent of each other. Instead, sources are prone to be mutually influenced, which makes them dependent when sharing information with each other. High dependency between sources makes collective intelligence vulnerable to the overuse of redundant (and possibly incorrect) information from the dependent sources. Thus, we reveal the latent group structure among dependent sources, and aggregate the information at the group level rather than from individual sources directly. This can prevent the collective intelligence from being inappropriately dominated by dependent sources. We will also explicitly reveal the reliability of groups, and minimize the negative impacts of unreliable groups. Experimental results on real-world data sets show the effectiveness of the proposed approach with respect to existing algorithms.
[1]
Bernhard Schölkopf,et al.
Accelerated Variational Dirichlet Process Mixtures
,
2007
.
[2]
Max Welling,et al.
Accelerated Variational Dirichlet Process Mixtures
,
2006,
NIPS.
[3]
Xiaoxin Yin,et al.
Semi-supervised truth discovery
,
2011,
WWW.
[4]
Tom Minka,et al.
How To Grade a Test Without Knowing the Answers - A Bayesian Graphical Model for Adaptive Crowdsourcing and Aptitude Testing
,
2012,
ICML.
[5]
Divesh Srivastava,et al.
Integrating Conflicting Data: The Role of Source Dependence
,
2009,
Proc. VLDB Endow..
[6]
Gjergji Kasneci,et al.
CoBayes: bayesian knowledge corroboration with assessors of unknown areas of expertise
,
2011,
WSDM '11.
[7]
J. Sethuraman.
A CONSTRUCTIVE DEFINITION OF DIRICHLET PRIORS
,
1991
.
[8]
Dan Roth,et al.
Knowing What to Believe (when you already know something)
,
2010,
COLING.
[9]
Michael I. Jordan,et al.
An Introduction to Variational Methods for Graphical Models
,
1999,
Machine Learning.
[10]
Bo Zhao,et al.
A Bayesian Approach to Discovering Truth from Conflicting Sources for Data Integration
,
2012,
Proc. VLDB Endow..
[11]
Philip S. Yu,et al.
Truth Discovery with Multiple Conflicting Information Providers on the Web
,
2007,
IEEE Transactions on Knowledge and Data Engineering.