A Faster Algorithm for Truth Discovery via Range Cover

Truth discovery is a key problem in data analytics which has received a great deal of attention in recent years. In this problem, we seek to obtain trustworthy information from data aggregated from multiple (possibly) unreliable sources. Most of the existing approaches for this problem are of heuristic nature and do not provide any quality guarantee. Very recently, the first quality-guaranteed algorithm has been discovered. However, the running time of the algorithm depends on the spread ratio of the input points and is fully polynomial only when the spread ratio is relatively small. This could restrict the applicability of the algorithm. To resolve this issue, we propose in this paper a new algorithm which yields a $$(1+\epsilon )$$(1+ϵ)-approximation in near quadratic time for any dataset with constant probability. Our algorithm relies on a data structure called range cover, which is interesting in its own right. The data structure provides a general approach for solving some high dimensional optimization problems by breaking down them into a small number of parametrized cases.

[1]  Dan Roth,et al.  Knowing What to Believe (when you already know something) , 2010, COLING.

[2]  Bo Zhao,et al.  A Confidence-Aware Approach for Truth Discovery on Long-Tail Data , 2014, Proc. VLDB Endow..

[3]  Bo Zhao,et al.  The wisdom of minority: discovering and targeting the right group of workers for crowdsourcing , 2014, WWW.

[4]  Javier R. Movellan,et al.  Whose Vote Should Count More: Optimal Integration of Labels from Labelers of Unknown Expertise , 2009, NIPS.

[5]  Ke Chen,et al.  On Coresets for k-Median and k-Means Clustering in Metric and Euclidean Spaces and Their Applications , 2009, SIAM J. Comput..

[6]  Philip S. Yu,et al.  Truth Discovery with Multiple Conflicting Information Providers on the Web , 2007, IEEE Transactions on Knowledge and Data Engineering.

[7]  Bo Zhao,et al.  A Survey on Truth Discovery , 2015, SKDD.

[8]  Bo Zhao,et al.  Resolving conflicts in heterogeneous data by truth discovery and source reliability estimation , 2014, SIGMOD Conference.

[9]  Kasturi R. Varadarajan,et al.  Geometric Approximation via Coresets , 2007 .

[10]  Sariel Har-Peled Geometric Approximation Algorithms , 2011 .

[11]  Michael Langberg,et al.  A unified framework for approximating and clustering data , 2011, STOC.

[12]  Jing Gao,et al.  Finding Global Optimum for Truth Discovery: Entropy Based Geometric Variance , 2016, Symposium on Computational Geometry.

[13]  Divesh Srivastava,et al.  Integrating Conflicting Data: The Role of Source Dependence , 2009, Proc. VLDB Endow..