Truth selection for truth discovery models exploiting ordering relationship among values

Data veracity is one of the main issues regarding Web data. Truth Discovery models can be used to assess it by estimating value confidence and source trustworthiness through analysis of claims on the same real-world entities provided by different sources. Many studies have been conducted in this domain. True values selected by most models have the highest confidence estimation. This naive strategy cannot be applied to identify true values when there is a partial order among values that is considered to enhance the final performance. Indeed, in this case, the resulting estimations monotonically increase with respect to the partial order of values. The highest confidence is always assigned to the most general value that is implicitly supported by all the others. Thus, using the highest confidence as criterion to select the true values is not appropriate because it will always return the most general values. To address this problem, we propose a post-processing procedure that, leveraging the partial order among values and their monotonic confidence estimations, is able to identify the expected true value. Experimental results on synthetic datasets show the effectiveness of our approach.

[1]  Bo Zhao,et al.  A Survey on Truth Discovery , 2015, SKDD.

[2]  Martin Necaský,et al.  Linked Open Data Aggregation: Conflict Resolution and Aggregate Quality , 2012, 2012 IEEE 36th Annual Computer Software and Applications Conference Workshops.

[3]  Divesh Srivastava,et al.  Truth Discovery and Copying Detection in a Dynamic World , 2009, Proc. VLDB Endow..

[4]  Divesh Srivastava,et al.  Sailing the Information Ocean with Awareness of Currents: Discovery and Application of Source Dependence , 2009, CIDR.

[5]  Fakhri Karray,et al.  Multisensor data fusion: A review of the state-of-the-art , 2013, Inf. Fusion.

[6]  Alfred V. Aho,et al.  The Transitive Reduction of a Directed Graph , 1972, SIAM J. Comput..

[7]  Philip S. Yu,et al.  Truth Discovery with Multiple Conflicting Information Providers on the Web , 2008, IEEE Trans. Knowl. Data Eng..

[8]  Dan Roth,et al.  Knowing What to Believe (when you already know something) , 2010, COLING.

[9]  Shiguang Wang,et al.  Towards Cyber-Physical Systems in Social Spaces: The Data Reliability Challenge , 2014, 2014 IEEE Real-Time Systems Symposium.

[10]  Dexter Kozen,et al.  The Design and Analysis of Algorithms , 1991, Texts and Monographs in Computer Science.

[11]  Shen Li,et al.  Scalable social sensing of interdependent phenomena , 2015, IPSN.

[12]  Dong Wang,et al.  Social Sensing: Building Reliable Systems on Unreliable Data , 2015 .

[13]  Patrice Bellot,et al.  Uncertainty detection in natural language: a probabilistic model , 2016, WIMS.

[14]  Charu C. Aggarwal,et al.  Mining collective intelligence in diverse groups , 2013, WWW.

[15]  Mani B. Srivastava,et al.  Truth Discovery in Crowdsourced Detection of Spatial Events , 2016, IEEE Trans. Knowl. Data Eng..

[16]  Jing Gao,et al.  Truth Discovery on Crowd Sensing of Correlated Entities , 2015, SenSys.

[17]  Bo Zhao,et al.  A Bayesian Approach to Discovering Truth from Conflicting Sources for Data Integration , 2012, Proc. VLDB Endow..

[18]  Sylvie Ranwez,et al.  Semantic Similarity from Natural Language and Ontology Analysis , 2015, Synthesis Lectures on Human Language Technologies.

[19]  Adolfo Guzmán-Arenas,et al.  The centroid or consensus of a set of objects with qualitative attributes , 2011, Expert Syst. Appl..

[20]  Bo Zhao,et al.  A Confidence-Aware Approach for Truth Discovery on Long-Tail Data , 2014, Proc. VLDB Endow..

[21]  Sylvie Ranwez,et al.  How Can Ontologies Give You Clue for Truth-Discovery? An Exploratory Study , 2016, WIMS.

[22]  Victor S. Sheng,et al.  Noise filtering to improve data and model quality for crowdsourcing , 2016, Knowl. Based Syst..

[23]  Divesh Srivastava,et al.  Fusing data with correlations , 2014, SIGMOD Conference.

[24]  Antoon Bronselaer,et al.  Dynamical order construction in data fusion , 2016, Inf. Fusion.

[25]  Jens Lehmann,et al.  DBpedia: A Nucleus for a Web of Open Data , 2007, ISWC/ASWC.

[26]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[27]  Hengchang Liu,et al.  Exploitation of Physical Constraints for Reliable Social Sensing , 2013, 2013 IEEE 34th Real-Time Systems Symposium.

[28]  Georgios Paliouras,et al.  Evaluation measures for hierarchical classification: a unified view and novel approaches , 2013, Data Mining and Knowledge Discovery.

[29]  Tingting He,et al.  An empirical study of topic-sensitive probabilistic model for expert finding in question answer communities , 2014, Knowl. Based Syst..

[30]  Lina Yao,et al.  An Integrated Bayesian Approach for Effective Multi-Truth Discovery , 2015, CIKM.

[31]  Felix Naumann,et al.  Data fusion , 2009, CSUR.

[32]  Dan Roth,et al.  Latent credibility analysis , 2013, WWW.

[33]  Xiaoxin Yin,et al.  Semi-supervised truth discovery , 2011, WWW.

[34]  Mani B. Srivastava,et al.  Aggregating Crowdsourced Quantitative Claims: Additive and Multiplicative Models , 2016, IEEE Transactions on Knowledge and Data Engineering.

[35]  Tony Veale,et al.  An Intrinsic Information Content Metric for Semantic Similarity in WordNet , 2004, ECAI.

[36]  Divesh Srivastava,et al.  Integrating Conflicting Data: The Role of Source Dependence , 2009, Proc. VLDB Endow..

[37]  Laure Berti-Équille,et al.  Truth Discovery Algorithms: An Experimental Evaluation , 2014, ArXiv.