DeFacto - Temporal and multilingual Deep Fact Validation

One of the main tasks when creating and maintaining knowledge bases is to validate facts and provide sources for them in order to ensure correctness and traceability of the provided knowledge. So far, this task is often addressed by human curators in a three-step process: issuing appropriate keyword queries for the statement to check using standard search engines, retrieving potentially relevant documents and screening those documents for relevant content. The drawbacks of this process are manifold. Most importantly, it is very time-consuming as the experts have to carry out several search processes and must often read several documents. In this article, we present DeFacto (Deep Fact Validation)-an algorithm able to validate facts by finding trustworthy sources for them on the Web. DeFacto aims to provide an effective way of validating facts by supplying the user with relevant excerpts of web pages as well as useful additional information including a score for the confidence DeFacto has in the correctness of the input fact. To achieve this goal, DeFacto collects and combines evidence from web pages written in several languages. In addition, DeFacto provides support for facts with a temporal scope, i.e.,?it can estimate in which time frame a fact was valid. Given that the automatic evaluation of facts has not been paid much attention to so far, generic benchmarks for evaluating these frameworks were not previously available. We thus also present a generic evaluation framework for fact checking and make it publicly available.

[1]  Dan Roth,et al.  Latent credibility analysis , 2013, WWW.

[2]  Divesh Srivastava,et al.  Truth Discovery and Copying Detection in a Dynamic World , 2009, Proc. VLDB Endow..

[3]  Steffen Stadtmüller,et al.  On the Diversity and Availability of Temporal Information in Linked Open Data , 2012, SEMWEB.

[4]  Simone Paolo Ponzetto,et al.  BabelNet: The automatic construction, evaluation and application of a wide-coverage multilingual semantic network , 2012, Artif. Intell..

[5]  Tom M. Mitchell,et al.  Coupled temporal scoping of relational facts , 2012, WSDM '12.

[6]  Martin Theobald,et al.  Interactive reasoning in uncertain RDF knowledge bases , 2011, CIKM '11.

[7]  Eibe Frank,et al.  Logistic Model Trees , 2003, Machine Learning.

[8]  Dan Roth,et al.  Generalized fact-finding , 2011, WWW.

[9]  Jens Lehmann,et al.  Integrating NLP Using Linked Data , 2013, SEMWEB.

[10]  Gerhard Weikum,et al.  Timely YAGO: harvesting, querying, and visualizing temporal knowledge from Wikipedia , 2010, EDBT '10.

[11]  Gerhard Weikum,et al.  Robust Disambiguation of Named Entities in Text , 2011, EMNLP.

[12]  Christian Bizer,et al.  DBpedia spotlight: shedding light on the web of documents , 2011, I-Semantics '11.

[13]  Patrick Pantel,et al.  FactRank: Random Walks on a Web of Facts , 2010, COLING.

[14]  Mitsuru Ishizuka,et al.  Relation Extraction from Wikipedia Using Subtree Mining , 2007, AAAI.

[15]  Jens Lehmann,et al.  DBpedia - A large-scale, multilingual knowledge base extracted from Wikipedia , 2015, Semantic Web.

[16]  Tom M. Mitchell,et al.  Acquiring temporal constraints between relations , 2012, CIKM.

[17]  Sergey Brin,et al.  Extracting Patterns and Relations from the World Wide Web , 1998, WebDB.

[18]  Vassilis Christophides,et al.  On Provenance of Queries on Semantic Web Data , 2011, IEEE Internet Computing.

[19]  Luis Gravano,et al.  Snowball: extracting relations from large plain-text collections , 2000, DL '00.

[20]  Jens Lehmann,et al.  Quality Assessment Methodologies for Linked Open Data A Systematic Literature Review and Conceptual Framework , 2012 .

[21]  Dan Roth,et al.  Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence Making Better Informed Trust Decisions with Generalized Fact-Finding , 2022 .

[22]  Sören Auer,et al.  AGDISTIS - Graph-Based Disambiguation of Named Entities Using Linked Data , 2014, International Semantic Web Conference.

[23]  Olaf Hartig,et al.  Publishing and Consuming Provenance Metadata on the Web of Linked Data , 2010, IPAW.

[24]  Jon M Kleinberg,et al.  Hubs, authorities, and communities , 1999, CSUR.

[25]  Axel-Cyrille Ngonga Ngomo,et al.  Extracting Multilingual Natural-Language Patterns for RDF Predicates , 2012, EKAW.

[26]  Nigel Shadbolt,et al.  Linked Timelines: Temporal Representation and Management in Linked Data , 2010, COLD.

[27]  Ralph Grishman,et al.  NYU: Description of the Proteus/PET System as Used for MUC-7 ST , 1998, MUC.

[28]  Deborah L. McGuinness,et al.  PROV-O: The PROV Ontology , 2013 .

[29]  Olaf Hartig Provenance Information in the Web of Data , 2009, LDOW.

[30]  O. Hartig Trustworthiness of Data on the Web , 2008 .

[31]  Gerhard Weikum,et al.  Scalable knowledge harvesting with high precision and high recall , 2011, WSDM '11.

[32]  Hans Uszkoreit,et al.  Large-Scale Learning of Relation-Extraction Rules with Distant Supervision from the Web , 2012, International Semantic Web Conference.

[33]  Estevam R. Hruschka,et al.  Toward an Architecture for Never-Ending Language Learning , 2010, AAAI.

[34]  Jens Lehmann,et al.  DeFacto - Deep Fact Validation , 2012, SEMWEB.

[35]  Isabelle Augenstein,et al.  LODifier: Generating Linked Data from Unstructured Text , 2012, ESWC.

[36]  Steffen Staab,et al.  Querying for provenance, trust, uncertainty and other meta knowledge in RDF , 2009, J. Web Semant..

[37]  Hwee Tou Ng,et al.  Proceedings of the Conference on Empirical Methods in Natural Language Processing , 2008 .

[38]  James Cheney,et al.  PROV-O: The PROV ontology:W3C recommendation 30 April 2013 , 2013 .

[39]  Imed Zitouni,et al.  Relation Extraction , 2014, NLP of Semitic Languages.

[40]  Serge Abiteboul,et al.  Corroborating information from disagreeing views , 2010, WSDM '10.

[41]  Philip S. Yu,et al.  Truth Discovery with Multiple Conflicting Information Providers on the Web , 2007, IEEE Transactions on Knowledge and Data Engineering.

[42]  A. Maurino,et al.  Quality Assessment Methodologies for Linked Open Data , 2012 .

[43]  Claudio Gutiérrez,et al.  Temporal RDF , 2005, ESWC.

[44]  Gerhard Weikum,et al.  Harvesting facts from textual web sources by constrained label propagation , 2011, CIKM '11.

[45]  Jens Lehmann,et al.  DBpedia - A crystallization point for the Web of Data , 2009, J. Web Semant..

[46]  Naoaki Okazaki,et al.  Unsupervised Relation Extraction by Mining Wikipedia Texts Using Information from the Web , 2009, ACL.