Exploiting Source-Object Networks to Resolve Object Conflicts in Linked Data

Considerable effort has been exerted to increase the scale of Linked Data. However, an inevitable problem arises when dealing with data integration from multiple sources. Various sources often provide conflicting objects for a certain predicate of the same real-world entity, thereby causing the so-called object conflict problem. At present, object conflict problem has not received sufficient attention in the Linked Data community. Thus, in this paper, we firstly formalize the object conflict resolution as computing the joint distribution of variables on a heterogeneous information network called the Source-Object Network, which successfully captures three correlations from objects and Linked Data sources. Then, we introduce a novel approach based on network effects called ObResolution (object resolution), to identify a true object from multiple conflicting objects. ObResolution adopts a pairwise Markov Random Field (pMRF) to model all evidence under a unified framework. Extensive experimental results on six real-world datasets show that our method achieves higher accuracy than existing approaches and it is robust and consistent in various domains.

[1]  Leman Akoglu,et al.  Collective Opinion Spam Detection: Bridging Review Networks and Metadata , 2015, KDD.

[2]  Divesh Srivastava,et al.  Truth Finding on the Deep Web: Is the Problem Solved? , 2012, Proc. VLDB Endow..

[3]  Jian Zhang,et al.  TruthDiscover: A Demonstration of Resolving Object Conflicts on Massive Linked Data , 2016, ArXiv.

[4]  Gonzalo Navarro,et al.  A guided tour to approximate string matching , 2001, CSUR.

[5]  Philip S. Yu,et al.  Truth Discovery with Multiple Conflicting Information Providers on the Web , 2007, IEEE Transactions on Knowledge and Data Engineering.

[6]  Philip S. Yu,et al.  Truth Discovery with Multiple Conflicting Information Providers on the Web , 2008, IEEE Trans. Knowl. Data Eng..

[7]  Bo Zhao,et al.  A Bayesian Approach to Discovering Truth from Conflicting Sources for Data Integration , 2012, Proc. VLDB Endow..

[8]  Divesh Srivastava,et al.  Integrating Conflicting Data: The Role of Source Dependence , 2009, Proc. VLDB Endow..

[9]  Jean Carletta,et al.  Assessing Agreement on Classification Tasks: The Kappa Statistic , 1996, CL.

[10]  Jeff Z. Pan,et al.  Effective Online Knowledge Graph Fusion , 2015, International Semantic Web Conference.

[11]  Jian Zhang,et al.  TruthDiscover: Resolving Object Conflicts on Massive Linked Data , 2017, WWW.

[12]  Bo Zhao,et al.  A Confidence-Aware Approach for Truth Discovery on Long-Tail Data , 2014, Proc. VLDB Endow..

[13]  Xiaoxin Yin,et al.  Semi-supervised truth discovery , 2011, WWW.

[14]  A. Maurino,et al.  Quality Assessment Methodologies for Linked Open Data , 2012 .

[15]  Dan Roth,et al.  Content-driven trust propagation framework , 2011, KDD.

[16]  Yves Raimond,et al.  RDF 1.1 Primer , 2014 .

[17]  Martin Necaský,et al.  Linked Data Integration with Conflicts , 2014, ArXiv.

[18]  Wenqiang Liu,et al.  Truth Discovery to Resolve Object Conflicts in Linked Data , 2015, ArXiv.

[19]  Jens Lehmann,et al.  Quality assessment for Linked Data: A Survey , 2015, Semantic Web.

[20]  Christian Bizer,et al.  Sieve: linked data quality assessment and fusion , 2012, EDBT-ICDT '12.

[21]  Bo Zhao,et al.  On the Discovery of Evolving Truth , 2015, KDD.

[22]  Bo Zhao,et al.  Resolving conflicts in heterogeneous data by truth discovery and source reliability estimation , 2014, SIGMOD Conference.

[23]  Wei Zhang,et al.  Knowledge-Based Trust: Estimating the Trustworthiness of Web Sources , 2015, Proc. VLDB Endow..

[24]  Matthew A. Jaro,et al.  Advances in Record-Linkage Methodology as Applied to Matching the 1985 Census of Tampa, Florida , 1989 .

[25]  Simone Paolo Ponzetto,et al.  A Probabilistic Approach for Integrating Heterogeneous Knowledge Sources , 2014, ESWC.