Truth Discovery to Resolve Object Conflicts in Linked Data

In the community of Linked Data, anyone can publish their data as Linked Data on the web because of the openness of the Semantic Web. As such, RDF (Resource Description Framework) triples described the same real-world entity can be obtained from multiple sources; it inevitably results in conflicting objects for a certain predicate of a real-world entity. The objective of this study is to identify one truth from multiple conflicting objects for a certain predicate of a real-world entity. An intuitive principle based on common sense is that an object from a reliable source is trustworthy; thus, a source that provide trustworthy object is reliable. Many truth discovery methods based on this principle have been proposed to estimate source reliability and identify the truth. However, the effectiveness of existing truth discovery methods is significantly affected by the number of objects provided by each source. Therefore, these methods cannot be trivially extended to resolve conflicts in Linked Data with a scale-free property, i.e., most of the sources provide few conflicting objects, whereas only a few sources have many conflicting objects. To address this challenge, we propose a novel approach called TruthDiscover to identify the truth in Linked Data with a scale-free property. Two strategies are adopted in TruthDiscover to reduce the effect of the scale-free property on truth discovery. First, this approach leverages the topological properties of the Source Belief Graph to estimate the priori beliefs of sources, which are utilized to smooth the trustworthiness of sources. Second, this approach utilizes the Hidden Markov Random Field to model the interdependencies between objects to estimate the trust values of objects accurately. Experiments are conducted in the six datasets to evaluate TruthDiscover.

[1]  Bo Zhao,et al.  Resolving conflicts in heterogeneous data by truth discovery and source reliability estimation , 2014, SIGMOD Conference.

[2]  Divesh Srivastava,et al.  Information Theory For Data Management , 2009, Proc. VLDB Endow..

[3]  Divesh Srivastava,et al.  Integrating Conflicting Data: The Role of Source Dependence , 2009, Proc. VLDB Endow..

[4]  Wei Zhang,et al.  Knowledge-Based Trust: Estimating the Trustworthiness of Web Sources , 2015, Proc. VLDB Endow..

[5]  Fabian M. Suchanek,et al.  Yago: A Core of Semantic Knowledge Unifying WordNet and Wikipedia , 2007 .

[6]  Judea Pearl,et al.  Reverend Bayes on Inference Engines: A Distributed Hierarchical Approach , 1982, AAAI.

[7]  Gonzalo Navarro,et al.  A guided tour to approximate string matching , 2001, CSUR.

[8]  Felix Naumann,et al.  Data Fusion – Resolving Data Conflicts for Integration , 2009 .

[9]  Simone Paolo Ponzetto,et al.  A Probabilistic Approach for Integrating Heterogeneous Knowledge Sources , 2014, ESWC.

[10]  Martin Necaský,et al.  Linked Data Integration with Conflicts , 2014, ArXiv.

[11]  Jens Lehmann,et al.  Quality assessment for Linked Data: A Survey , 2015, Semantic Web.

[12]  Bo Zhao,et al.  A Bayesian Approach to Discovering Truth from Conflicting Sources for Data Integration , 2012, Proc. VLDB Endow..

[13]  Dan Roth,et al.  Knowing What to Believe (when you already know something) , 2010, COLING.

[14]  Divesh Srivastava,et al.  Truth Finding on the Deep Web: Is the Problem Solved? , 2012, Proc. VLDB Endow..

[15]  Jürgen Umbrich,et al.  LDspider: An Open-source Crawling Framework for the Web of Linked Data , 2010, SEMWEB.

[16]  Michael I. Jordan,et al.  Factorial Hidden Markov Models , 1995, Machine Learning.

[17]  Umeshwar Dayal,et al.  Processing Queries Over Generalization Hierarchies in a Multidatabase System , 1983, VLDB.

[18]  Yi Li,et al.  RiMOM: A Dynamic Multistrategy Ontology Alignment Framework , 2009, IEEE Transactions on Knowledge and Data Engineering.

[19]  Deborah L. McGuinness,et al.  When owl: sameAs Isn't the Same: An Analysis of Identity in Linked Data , 2010, SEMWEB.

[20]  Bo Zhao,et al.  A Confidence-Aware Approach for Truth Discovery on Long-Tail Data , 2014, Proc. VLDB Endow..

[21]  Yuzhong Qu,et al.  GMO: A Graph Matching for Ontologies , 2005, Integrating Ontologies.

[22]  Dan Roth,et al.  Content-driven trust propagation framework , 2011, KDD.

[23]  Tim Berners-Lee,et al.  Linked Data - The Story So Far , 2009, Int. J. Semantic Web Inf. Syst..

[24]  Yuzhong Qu,et al.  Constructing virtual documents for ontology matching , 2006, WWW '06.

[25]  L. Stein,et al.  OWL Web Ontology Language - Reference , 2004 .

[26]  Wayne D. Blizard,et al.  Multiset Theory , 1989, Notre Dame J. Formal Log..

[27]  Gwenn Englebienne,et al.  Learning Concept Mappings from Instance Similarity , 2008, SEMWEB.

[28]  Xiaoxin Yin,et al.  Semi-supervised truth discovery , 2011, WWW.

[29]  A. Maurino,et al.  Quality Assessment Methodologies for Linked Open Data , 2012 .

[30]  Philip S. Yu,et al.  Truth Discovery with Multiple Conflicting Information Providers on the Web , 2007, IEEE Transactions on Knowledge and Data Engineering.

[31]  Gerhard Weikum,et al.  WWW 2007 / Track: Semantic Web Session: Ontologies ABSTRACT YAGO: A Core of Semantic Knowledge , 2022 .

[32]  Bo Zhao,et al.  On the Discovery of Evolving Truth , 2015, KDD.

[33]  Elena Console,et al.  Data Fusion , 2009, Encyclopedia of Database Systems.

[34]  Andreas Harth,et al.  Performing Object Consolidation on the Semantic Web Data Graph , 2007, I3.

[35]  Felix Naumann,et al.  Conflict Handling Strategies in an Integrated Information System , 2006 .

[36]  Deborah L. McGuinness,et al.  OWL Web ontology language overview , 2004 .

[37]  Deborah L. McGuinness,et al.  SameAs Networks and Beyond: Analyzing Deployment Status and Implications of owl: sameAs in Linked Data , 2010, International Semantic Web Conference.

[38]  Bo Zhao,et al.  A Survey on Truth Discovery , 2015, SKDD.

[39]  Jean Carletta,et al.  Assessing Agreement on Classification Tasks: The Kappa Statistic , 1996, CL.

[40]  Deborah L. McGuinness,et al.  owl:sameAs and Linked Data: An Empirical Study , 2010 .

[41]  Hugh Glaser,et al.  Managing Co-reference on the Semantic Web , 2009, LDOW.

[42]  Christian Bizer,et al.  Sieve: linked data quality assessment and fusion , 2012, EDBT-ICDT '12.

[43]  Michael I. Jordan,et al.  Loopy Belief Propagation for Approximate Inference: An Empirical Study , 1999, UAI.