Content-driven trust propagation framework

Existing fact-finding models assume availability of structured data or accurate information extraction. However, as online data gets more unstructured, these assumptions are no longer valid. To overcome this, we propose a novel, content-based, trust propagation framework that relies on signals from the textual content to ascertain veracity of free-text claims and compute trustworthiness of their sources. We incorporate the quality of relevant content into the framework and present an iterative algorithm for propagation of trust scores. We show that existing fact finders on structured data can be modeled as specific instances of this framework. Using a retrieval-based approach to find relevant articles, we instantiate the framework to compute trustworthiness of news sources and articles. We show that the proposed framework helps ascertain trustworthiness of sources better. We also show that ranking news articles based on trustworthiness learned from the content-driven framework is significantly better than baselines that ignore either the content quality or the trust framework.

[1]  Dan Roth,et al.  A Framework for Entailed Relation Recognition , 2009, ACL.

[2]  Thomas Hofmann,et al.  Probabilistic latent semantic indexing , 1999, SIGIR '99.

[3]  William A. Wallace,et al.  Trust in digital information , 2008, J. Assoc. Inf. Sci. Technol..

[4]  R. David Lankes,et al.  Credibility on the internet: shifting from authority to reliability , 2008, J. Documentation.

[5]  Luca de Alfaro,et al.  A content-driven reputation system for the wikipedia , 2007, WWW '07.

[6]  Chu-Ren Huang,et al.  Evidentiality for Text Trustworthiness Detection , 2010 .

[7]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[8]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[9]  Amélie Marian,et al.  Corroborating Answers from Multiple Web Sources , 2007, WebDB.

[10]  Taher H. Haveliwala Topic-sensitive PageRank , 2002, IEEE Trans. Knowl. Data Eng..

[11]  Jeffrey Pomerantz,et al.  Evaluating and predicting answer quality in community QA , 2010, SIGIR.

[12]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[13]  M. Kendall A NEW MEASURE OF RANK CORRELATION , 1938 .

[14]  Jaana Kekäläinen,et al.  Cumulated gain-based evaluation of IR techniques , 2002, TOIS.

[15]  Dan Roth,et al.  Knowing What to Believe (when you already know something) , 2010, COLING.

[16]  Serge Abiteboul,et al.  Corroborating information from disagreeing views , 2010, WSDM '10.

[17]  Philip S. Yu,et al.  Truth Discovery with Multiple Conflicting Information Providers on the Web , 2007, IEEE Transactions on Knowledge and Data Engineering.

[18]  Divesh Srivastava,et al.  Truth Discovery and Copying Detection in a Dynamic World , 2009, Proc. VLDB Endow..

[19]  Bei Yu,et al.  A cross-collection mixture model for comparative text mining , 2004, KDD.

[20]  Pradeep Ravikumar,et al.  A Comparison of String Distance Metrics for Name-Matching Tasks , 2003, IIWeb.

[21]  Divesh Srivastava,et al.  Global detection of complex copying relationships between sources , 2010, Proc. VLDB Endow..