A probabilistic model for truth discovery with object correlations

In the era of big data, information can be collected from many sources. Unfortunately, the information provided by the multiple sources on the same object is usually conflicting. In light of this challenge, truth discovery has emerged and used in many applications. The advantage of truth discovery is that it incorporates source reliabilities to infer object truths. Many existing methods for truth discovery are proposed with many traits. However, most of them ignore the characteristic of object correlations in data and focus on static data only. Object correlations exist in many applications. In this work, we propose a probabilistic truth discovery model that considers not only source reliability but also object correlations. This is especially useful when objects only claimed by few sources, which is common for many real applications. Furthermore, an incremental truth discovery method that considers object correlations is also developed when data provided by multiple sources arrives sequentially. Truth can be inferred dynamically without revisiting historical data, and temporal correlation is considered for truth inference. The experiments on both real-world and synthetic datasets demonstrate that the proposed methods perform better than the existing truth discovery methods.

[1]  S. Böcker,et al.  Comprehensive cluster analysis with Transitivity Clustering , 2011, Nature Protocols.

[2]  Bo Zhao,et al.  Resolving conflicts in heterogeneous data by truth discovery and source reliability estimation , 2014, SIGMOD Conference.

[3]  Divesh Srivastava,et al.  Fusing data with correlations , 2014, SIGMOD Conference.

[4]  Nir Friedman,et al.  Probabilistic Graphical Models - Principles and Techniques , 2009 .

[5]  Ashwin Machanavajjhala,et al.  Information integration over time in unreliable and uncertain environments , 2012, WWW.

[6]  Bo Zhao,et al.  The wisdom of minority: discovering and targeting the right group of workers for crowdsourcing , 2014, WWW.

[7]  Dimitri P. Bertsekas,et al.  Nonlinear Programming , 1997 .

[8]  Divesh Srivastava,et al.  Global detection of complex copying relationships between sources , 2010, Proc. VLDB Endow..

[9]  Gerhard Weikum,et al.  People on drugs: credibility of user statements in health communities , 2014, KDD.

[10]  Ge Yu,et al.  An Effective and Efficient Truth Discovery Framework over Data Streams , 2017, EDBT.

[11]  Wei Fan,et al.  Reliable Medical Diagnosis from Crowdsourcing: Discover Trustworthy Answers from Non-Experts , 2017, WSDM.

[12]  Bo Zhao,et al.  On the Discovery of Evolving Truth , 2015, KDD.

[13]  Taylor Cassidy,et al.  The Wisdom of Minority: Unsupervised Slot Filling Validation based on Multi-dimensional Truth-Finding , 2014, COLING.

[14]  Philip S. Yu,et al.  Truth Discovery with Multiple Conflicting Information Providers on the Web , 2008, IEEE Trans. Knowl. Data Eng..

[15]  Wei Hu,et al.  Exploiting Source-Object Networks to Resolve Object Conflicts in Linked Data , 2017, ESWC.

[16]  Shen Li,et al.  Scalable social sensing of interdependent phenomena , 2015, IPSN.

[17]  Hengchang Liu,et al.  Exploitation of Physical Constraints for Reliable Social Sensing , 2013, 2013 IEEE 34th Real-Time Systems Symposium.

[18]  Bo Zhao,et al.  A Bayesian Approach to Discovering Truth from Conflicting Sources for Data Integration , 2012, Proc. VLDB Endow..

[19]  Wei Zhang,et al.  Knowledge vault: a web-scale approach to probabilistic knowledge fusion , 2014, KDD.

[20]  Jing Gao,et al.  Truth Discovery on Crowd Sensing of Correlated Entities , 2015, SenSys.

[21]  Amir Beck,et al.  On the Convergence of Block Coordinate Descent Type Methods , 2013, SIAM J. Optim..

[22]  Lina Yao,et al.  Truth Discovery via Exploiting Implications from Multi-Source Data , 2016, CIKM.

[23]  Charu C. Aggarwal,et al.  Recursive Fact-Finding: A Streaming Approach to Truth Estimation in Crowdsourcing Applications , 2013, 2013 IEEE 33rd International Conference on Distributed Computing Systems.

[24]  Xiaoxin Yin,et al.  Semi-supervised truth discovery , 2011, WWW.

[25]  Murat Demirbas,et al.  Crowdsourcing for Multiple-Choice Question Answering , 2014, AAAI.

[26]  Divesh Srivastava,et al.  Integrating Conflicting Data: The Role of Source Dependence , 2009, Proc. VLDB Endow..

[27]  Guoliang Li,et al.  Truth Inference in Crowdsourcing: Is the Problem Solved? , 2017, Proc. VLDB Endow..

[28]  Jiawei Han,et al.  Data Mining: Concepts and Techniques , 2000 .

[29]  Bo Zhao,et al.  Conflicts to Harmony: A Framework for Resolving Conflicts in Heterogeneous Data by Truth Discovery , 2016, IEEE Transactions on Knowledge and Data Engineering.

[30]  Divesh Srivastava,et al.  Truth Discovery and Copying Detection in a Dynamic World , 2009, Proc. VLDB Endow..