论文信息 - On Bayesian interpretation of fact-finding in information networks

On Bayesian interpretation of fact-finding in information networks

When information sources are unreliable, information networks have been used in data mining literature to uncover facts from large numbers of complex relations between noisy variables. The approach relies on topology analysis of graphs, where nodes represent pieces of (unreliable) information and links represent abstract relations. Such topology analysis was often empirically shown to be quite powerful in extracting useful conclusions from large amounts of poor-quality information. However, no systematic analysis was proposed for quantifying the accuracy of such conclusions. In this paper, we present, for the first time, a Bayesian interpretation of the basic mechanism used in fact-finding from information networks. This interpretation leads to a direct quantification of the accuracy of conclusions obtained from information network analysis. Hence, we provide a general foundation for using information network analysis not only to heuristically extract likely facts, but also to quantify, in an analytically-founded manner, the probability that each fact or source is correct. Such probability constitutes a measure of quality of information (QoI). Hence, the paper presents a new foundation for QoI analysis in information networks, that is of great value in deriving information from unreliable sources. The framework is applied to a representative fact-finding problem, and is validated by extensive simulation where analysis shows significant improvement over past work and great correspondence with ground truth.

[1] Dan Roth,et al. Knowing What to Believe (when you already know something) , 2010, COLING.

[2] Divesh Srivastava,et al. Sailing the Information Ocean with Awareness of Currents: Discovery and Application of Source Dependence , 2009, CIDR.

[3] Xiaoxin Yin,et al. Semi-supervised truth discovery , 2011, WWW.

[4] Divesh Srivastava,et al. Global detection of complex copying relationships between sources , 2010, Proc. VLDB Endow..

[5] Divesh Srivastava,et al. Integrating Conflicting Data: The Role of Source Dependence , 2009, Proc. VLDB Endow..

[6] Subbarao Kambhampati,et al. SourceRank: relevance and trust assessment for deep web sources based on inter-source agreement , 2010, WWW '10.

[7] Lorenzo Blanco,et al. Probabilistic Models to Reconcile Complex Data from Inaccurate Data Sources , 2010, CAiSE.

[8] Yizhou Sun,et al. Trust analysis with clustering , 2011, WWW.

[9] Jiawei Han. Mining Heterogeneous Information Networks by Exploring the Power of Links , 2009, Discovery Science.

[10] Philip S. Yu,et al. Truth Discovery with Multiple Conflicting Information Providers on the Web , 2007, IEEE Transactions on Knowledge and Data Engineering.

[11] Serge Abiteboul,et al. Corroborating information from disagreeing views , 2010, WSDM '10.

[12] Sergey Brin,et al. The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.