CoBayes: bayesian knowledge corroboration with assessors of unknown areas of expertise

Our work aims at building probabilistic tools for constructing and maintaining large-scale knowledge bases containing entity-relationship-entity triples (statements) extracted from the Web. In order to mitigate the uncertainty inherent in information extraction and integration we propose leveraging the "wisdom of the crowds" by aggregating truth assessments that users provide about statements. The suggested method, CoBayes, operates on a collection of statements, a set of deduction rules (e.g. transitivity), a set of users, and a set of truth assessments of users about statements. We propose a joint probabilistic model of the truth values of statements and the expertise of users for assessing statements. The truth values of statements are interconnected through derivations based on the deduction rules. The correctness of a user's assessment for a given statement is modeled by linear mappings from user descriptions and statement descriptions into a common latent knowledge space where the inner product between user and statement vectors determines the probability that the user assessment for that statement will be correct. Bayesian inference in this complex graphical model is performed using mixed variational and expectation propagation message passing. We demonstrate the viability of CoBayes in comparison to other approaches, on realworld datasets and user feedback collected from Amazon Mechanical Turk.

[1]  Charles M. Bishop,et al.  Variational Message Passing , 2005, J. Mach. Learn. Res..

[2]  Daniel N. Osherson,et al.  Aggregating disparate estimates of chance , 2006, Games Econ. Behav..

[3]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[4]  Dan Brickley,et al.  Rdf vocabulary description language 1.0 : Rdf schema , 2004 .

[5]  Gerhard Weikum,et al.  WWW 2007 / Track: Semantic Web Session: Ontologies ABSTRACT YAGO: A Core of Semantic Knowledge , 2022 .

[6]  Christopher Ré,et al.  Probabilistic databases: diamonds in the dirt , 2009, CACM.

[7]  Andrew McCallum,et al.  Introduction to Statistical Relational Learning , 2007 .

[8]  Jaime Teevan,et al.  Implicit feedback for inferring user preference: a bibliography , 2003, SIGF.

[9]  A. P. Dawid,et al.  Maximum Likelihood Estimation of Observer Error‐Rates Using the EM Algorithm , 1979 .

[10]  Ben Taskar,et al.  Introduction to Statistical Relational Learning (Adaptive Computation and Machine Learning) , 2007 .

[11]  Thore Graepel,et al.  Matchbox: large scale online bayesian recommendations , 2009, WWW '09.

[12]  Tom Minka,et al.  A family of algorithms for approximate Bayesian inference , 2001 .

[13]  Jens Lehmann,et al.  DBpedia: A Nucleus for a Web of Open Data , 2007, ISWC/ASWC.

[14]  Gjergji Kasneci,et al.  Bayesian Knowledge Corroboration with Logical Rules and User Feedback , 2010, ECML/PKDD.

[15]  David J. C. MacKay,et al.  Choice of Basis for Laplace Approximation , 1998, Machine Learning.

[16]  Audun Jøsang,et al.  Exploring Different Types of Trust Propagation , 2006, iTrust.

[17]  Thore Graepel,et al.  Matchbox: Large Scale Bayesian Recommendations , 2009 .

[18]  Gerardo Hermosillo,et al.  Learning From Crowds , 2010, J. Mach. Learn. Res..

[19]  Serge Abiteboul,et al.  Corroborating information from disagreeing views , 2010, WSDM '10.

[20]  Brendan J. Frey,et al.  A Revolution: Belief Propagation in Graphs with Cycles , 1997, NIPS.

[21]  Herman J. ter Horst,et al.  Completeness, decidability and complexity of entailment for RDF Schema and a semantic extension involving the OWL vocabulary , 2005, J. Web Semant..