Latent credibility analysis

A frequent problem when dealing with data gathered from multiple sources on the web (ranging from booksellers to Wikipedia pages to stock analyst predictions) is that these sources disagree, and we must decide which of their (often mutually exclusive) claims we should accept. Current state-of-the-art information credibility algorithms known as "fact-finders" are transitive voting systems with rules specifying how votes iteratively flow from sources to claims and then back to sources. While this is quite tractable and often effective, fact-finders also suffer from substantial limitations; in particular, a lack of transparency obfuscates their credibility decisions and makes them difficult to adapt and analyze: knowing the mechanics of how votes are calculated does not readily tell us what those votes mean, and finding, for example, that a source has a score of 6 is not informative. We introduce a new approach to information credibility, Latent Credibility Analysis (LCA), constructing strongly principled, probabilistic models where the truth of each claim is a latent variable and the credibility of a source is captured by a set of model parameters. This gives LCA models clear semantics and modularity that make extending them to capture additional observed and latent credibility factors straightforward. Experiments over four real-world datasets demonstrate that LCA models can outperform the best fact-finders in both unsupervised and semi-supervised settings.

[1]  Divesh Srivastava,et al.  Truth Discovery and Copying Detection in a Dynamic World , 2009, Proc. VLDB Endow..

[2]  A. Jøsang Artificial Reasoning with Subjective Logic , 2008 .

[3]  Dan Roth,et al.  Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence Making Better Informed Trust Decisions with Generalized Fact-Finding , 2022 .

[4]  Munindar P. Singh,et al.  Detecting deception in reputation management , 2003, AAMAS '03.

[5]  Philippe Jorion,et al.  Risk Management Lessons from Long-Term Capital Management , 1999 .

[6]  Yurii Nesterov,et al.  Introductory Lectures on Convex Optimization - A Basic Course , 2014, Applied Optimization.

[7]  Serge Abiteboul,et al.  Corroborating information from disagreeing views , 2010, WSDM '10.

[8]  Charu C. Aggarwal,et al.  On Bayesian interpretation of fact-finding in information networks , 2011, 14th International Conference on Information Fusion.

[9]  G. Burton MALKIEL, . The Efficient Market Hypothesis and Its Critics, Journal of Economic Perspectives, , . , 2003 .

[10]  Dan Roth,et al.  Content-driven trust propagation framework , 2011, KDD.

[11]  Glenn Shafer,et al.  A Mathematical Theory of Evidence , 2020, A Mathematical Theory of Evidence.

[12]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[13]  Bo Zhao,et al.  A Bayesian Approach to Discovering Truth from Conflicting Sources for Data Integration , 2012, Proc. VLDB Endow..

[14]  Dan Roth,et al.  Knowing What to Believe (when you already know something) , 2010, COLING.

[15]  Ben Taskar,et al.  Posterior Regularization for Structured Latent Variable Models , 2010, J. Mach. Learn. Res..

[16]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[17]  Audun Jøsang,et al.  Exploring Different Types of Trust Propagation , 2006, iTrust.

[18]  Dan Roth,et al.  Generalized fact-finding , 2011, WWW.

[19]  B. Malkiel The Efficient Market Hypothesis and Its Critics , 2003 .

[20]  Yizhou Sun,et al.  Apollo: Towards factfinding in participatory sensing , 2011, Proceedings of the 10th ACM/IEEE International Conference on Information Processing in Sensor Networks.

[21]  Ming-Wei Chang,et al.  Structured learning with constrained conditional models , 2012, Machine Learning.

[22]  Philip S. Yu,et al.  Truth Discovery with Multiple Conflicting Information Providers on the Web , 2007, IEEE Transactions on Knowledge and Data Engineering.