Towards Confidence Interval Estimation in Truth Discovery

The demand for automatic extraction of true information (i.e., truths) from conflicting multi-source data has soared recently. A variety of truth discovery methods have witnessed great successes via jointly estimating source reliability and truths. All existing truth discovery methods focus on providing a point estimator for each object's truth, but in many real-world applications, confidence interval estimation of truths is more desirable, since confidence interval contains richer information. To address this challenge, in this paper, we propose a novel truth discovery method (ETCIBoot) to construct confidence interval estimates as well as identify truths, where the bootstrapping techniques are nicely integrated into the truth discovery procedure. Due to the properties of bootstrapping, the estimators obtained by ETCIBoot are more accurate and robust compared with the state-of-the-art truth discovery approaches. The proposed framework is further adapted to deal with large-scale truth discovery task in distributed paradigm. Theoretically, we prove the asymptotical consistency of the confidence interval obtained by ETCIBoot. Experimentally, we demonstrate that ETCIBoot is not only effective in constructing confidence intervals but also able to obtain better truth estimates.

[1]  Divesh Srivastava,et al.  Integrating Conflicting Data: The Role of Source Dependence , 2009, Proc. VLDB Endow..

[2]  Regina Y. Liu Bootstrap Procedures under some Non-I.I.D. Models , 1988 .

[3]  Divesh Srivastava,et al.  Truth Finding on the Deep Web: Is the Problem Solved? , 2012, Proc. VLDB Endow..

[4]  Fenglong Ma,et al.  Towards Confidence in the Truth: A Bootstrapping based Truth Discovery Approach , 2016, KDD.

[5]  Amélie Marian,et al.  Corroborating Information from Web Sources , 2011, IEEE Data Eng. Bull..

[6]  Charu C. Aggarwal,et al.  Mining collective intelligence in diverse groups , 2013, WWW.

[7]  Yan Liu,et al.  Parallel gibbs sampling for hierarchical dirichlet processes via gamma processes equivalence , 2014, KDD.

[8]  Bo Zhao,et al.  Resolving conflicts in heterogeneous data by truth discovery and source reliability estimation , 2014, SIGMOD Conference.

[9]  Jiawei Han,et al.  A Probabilistic Model for Estimating Real-valued Truth from Conflicting Sources , 2012 .

[10]  Chenglin Miao,et al.  Cloud-Enabled Privacy-Preserving Truth Discovery in Crowd Sensing Systems , 2015, SenSys.

[11]  Heng Ji,et al.  Modeling Truth Existence in Truth Discovery , 2015, KDD.

[12]  Heng Ji,et al.  FaitCrowd: Fine Grained Truth Discovery for Crowdsourced Data Aggregation , 2015, KDD.

[13]  Alexander J. Smola,et al.  Reducing the sampling complexity of topic models , 2014, KDD.

[14]  Bo Zhao,et al.  A Confidence-Aware Approach for Truth Discovery on Long-Tail Data , 2014, Proc. VLDB Endow..

[15]  Wilfred Ng,et al.  Truth Discovery in Data Streams: A Single-Pass Probabilistic Approach , 2014, CIKM.

[16]  Werner Kießling,et al.  Corroborating Information from Web Sources. , 2011 .

[17]  Jing Gao,et al.  Truth Discovery on Crowd Sensing of Correlated Entities , 2015, SenSys.

[18]  Lu Su,et al.  A Truth Discovery Approach with Theoretical Guarantee , 2016, KDD.

[19]  Wei Zhang,et al.  Knowledge vault: a web-scale approach to probabilistic knowledge fusion , 2014, KDD.

[20]  Dan Roth,et al.  Knowing What to Believe (when you already know something) , 2010, COLING.

[21]  Bo Zhao,et al.  A Survey on Truth Discovery , 2015, SKDD.

[22]  A. P. Dawid,et al.  Maximum Likelihood Estimation of Observer Error‐Rates Using the EM Algorithm , 1979 .

[23]  Heng Ji,et al.  Overview of the TAC 2010 Knowledge Base Population Track , 2010 .

[24]  Bo Zhao,et al.  A Bayesian Approach to Discovering Truth from Conflicting Sources for Data Integration , 2012, Proc. VLDB Endow..

[25]  Guoliang Li,et al.  Crowdsourced Data Management: Overview and Challenges , 2017, SIGMOD Conference.

[26]  Philip S. Yu,et al.  Truth Discovery with Multiple Conflicting Information Providers on the Web , 2007, IEEE Transactions on Knowledge and Data Engineering.

[27]  Tarek F. Abdelzaher,et al.  On truth discovery in social sensing: A maximum likelihood estimation approach , 2012, International Symposium on Information Processing in Sensor Networks.

[28]  Guoliang Li,et al.  Truth Inference in Crowdsourcing: Is the Problem Solved? , 2017, Proc. VLDB Endow..

[29]  F. Downton,et al.  Introduction to Mathematical Statistics , 1959 .

[30]  Serge Abiteboul,et al.  Corroborating information from disagreeing views , 2010, WSDM '10.

[31]  Gianluca Demartini,et al.  ZenCrowd: leveraging probabilistic reasoning and crowdsourcing techniques for large-scale entity linking , 2012, WWW.

[32]  Robert Tibshirani,et al.  An Introduction to the Bootstrap , 1994 .