On the Discovery of Evolving Truth

In the era of big data, information regarding the same objects can be collected from increasingly more sources. Unfortunately, there usually exist conflicts among the information coming from different sources. To tackle this challenge, truth discovery, i.e., to integrate multi-source noisy information by estimating the reliability of each source, has emerged as a hot topic. In many real world applications, however, the information may come sequentially, and as a consequence, the truth of objects as well as the reliability of sources may be dynamically evolving. Existing truth discovery methods, unfortunately, cannot handle such scenarios. To address this problem, we investigate the temporal relations among both object truths and source reliability, and propose an incremental truth discovery framework that can dynamically update object truths and source weights upon the arrival of new data. Theoretical analysis is provided to show that the proposed method is guaranteed to converge at a fast rate. The experiments on three real world applications and a set of synthetic data demonstrate the advantages of the proposed method over state-of-the-art truth discovery methods.

[1]  Raul Poler,et al.  Non-Linear Programming , 2014 .

[2]  Divesh Srivastava,et al.  Truth Finding on the Deep Web: Is the Problem Solved? , 2012, Proc. VLDB Endow..

[3]  Dan Roth,et al.  Latent credibility analysis , 2013, WWW.

[4]  Xiaoxin Yin,et al.  Semi-supervised truth discovery , 2011, WWW.

[5]  Xue Liu,et al.  Generalized Decision Aggregation in Distributed Sensing Systems , 2014, 2014 IEEE Real-Time Systems Symposium.

[6]  Wilfred Ng,et al.  Truth Discovery in Data Streams: A Single-Pass Probabilistic Approach , 2014, CIKM.

[7]  穂鷹 良介 Non-Linear Programming の計算法について , 1963 .

[8]  Beng Chin Ooi,et al.  Online data fusion , 2011, Proc. VLDB Endow..

[9]  Dan Roth,et al.  Knowing What to Believe (when you already know something) , 2010, COLING.

[10]  Lance Kaplan,et al.  On truth discovery in social sensing: A maximum likelihood estimation approach , 2012, 2012 ACM/IEEE 11th International Conference on Information Processing in Sensor Networks (IPSN).

[11]  Wei Zhang,et al.  Knowledge vault: a web-scale approach to probabilistic knowledge fusion , 2014, KDD.

[12]  Bo Zhao,et al.  A Bayesian Approach to Discovering Truth from Conflicting Sources for Data Integration , 2012, Proc. VLDB Endow..

[13]  Philip S. Yu,et al.  Truth Discovery with Multiple Conflicting Information Providers on the Web , 2007, IEEE Transactions on Knowledge and Data Engineering.

[14]  D. Roth,et al.  COMPREHENSIVE TRUST METRICS FOR INFORMATION NETWORKS , 2010 .

[15]  Ashwin Machanavajjhala,et al.  Information integration over time in unreliable and uncertain environments , 2012, WWW.

[16]  Jiawei Han,et al.  A Probabilistic Model for Estimating Real-valued Truth from Conflicting Sources , 2012 .

[17]  Serge Abiteboul,et al.  Corroborating information from disagreeing views , 2010, WSDM '10.

[18]  Gerhard Weikum,et al.  People on drugs: credibility of user statements in health communities , 2014, KDD.

[19]  Divesh Srivastava,et al.  Integrating Conflicting Data: The Role of Source Dependence , 2009, Proc. VLDB Endow..

[20]  Sujin Kim,et al.  The stochastic root-finding problem: Overview, solutions, and open questions , 2011, TOMC.

[21]  Bo Zhao,et al.  A Confidence-Aware Approach for Truth Discovery on Long-Tail Data , 2014, Proc. VLDB Endow..

[22]  Chan‐Fu Chen On Asymptotic Normality of Limiting Density Functions with Bayesian Implications , 1985 .

[23]  Bo Zhao,et al.  Resolving conflicts in heterogeneous data by truth discovery and source reliability estimation , 2014, SIGMOD Conference.

[24]  Divesh Srivastava,et al.  Fusing data with correlations , 2014, SIGMOD Conference.

[25]  Bo Zhao,et al.  A Survey on Truth Discovery , 2015, SKDD.

[26]  Shiguang Wang,et al.  Towards Cyber-Physical Systems in Social Spaces: The Data Reliability Challenge , 2014, 2014 IEEE Real-Time Systems Symposium.