Computational Fact Checking from Knowledge Networks

Traditional fact checking by expert journalists cannot keep up with the enormous volume of information that is now generated online. Computational fact checking may significantly enhance our ability to evaluate the veracity of dubious information. Here we show that the complexities of human fact checking can be approximated quite well by finding the shortest path between concept nodes under properly defined semantic proximity metrics on knowledge graphs. Framed as a network problem this approach is feasible with efficient computational techniques. We evaluate this approach by examining tens of thousands of claims related to history, entertainment, geography, and biographical information using a public knowledge graph extracted from Wikipedia. Statements independently known to be true consistently receive higher support via our method than do false ones. These findings represent a significant step toward scalable computational fact-checking methods that may one day mitigate the spread of harmful misinformation.

[1]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[2]  Barbara Poblete,et al.  Twitter under crisis: can we trust what we RT? , 2010, SOMA '10.

[3]  Dunja Mladenic,et al.  Proceedings of the 3rd international workshop on Link discovery , 2005, KDD 2005.

[4]  Pankaj K. Agarwal,et al.  Toward Computational Fact-Checking , 2014, Proc. VLDB Endow..

[5]  A. Kata A postmodern Pandora's box: anti-vaccination misinformation on the Internet. , 2010, Vaccine.

[6]  Tom M. Mitchell,et al.  Random Walk Inference and Learning in A Large Scale Knowledge Base , 2011, EMNLP.

[7]  Ullrich K. H. Ecker,et al.  Misinformation and Its Correction , 2012, Psychological science in the public interest : a journal of the American Psychological Society.

[8]  K. T. Poole,et al.  Congress: A Political-Economic History of Roll Call Voting , 1997 .

[9]  J. Giles Internet encyclopaedias go head to head , 2005, Nature.

[10]  Ponnurangam Kumaraguru,et al.  TweetCred: A Real-time Web-based System for Assessing Credibility of Content on Twitter , 2014, ArXiv.

[11]  R. Doyle The American terrorist. , 2001, Scientific American.

[12]  Wei Zhang,et al.  Knowledge vault: a web-scale approach to probabilistic knowledge fusion , 2014, KDD.

[13]  Albert-Lszl Barabsi,et al.  Network Science , 2016, Encyclopedia of Big Data.

[14]  Christopher Ré,et al.  Elementary: Large-Scale Knowledge-Base Construction via Machine Learning and Statistical Inference , 2012, Int. J. Semantic Web Inf. Syst..

[15]  Lada A. Adamic,et al.  The political blogosphere and the 2004 U.S. election: divided they blog , 2005, LinkKDD '05.

[16]  Miriam J. Metzger,et al.  Perceptions of Internet Information Credibility , 2000 .

[17]  P. Kleingeld,et al.  The Stanford Encyclopedia of Philosophy , 2013 .

[18]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[19]  Evgeniy Gabrilovich,et al.  A Review of Relational Machine Learning for Knowledge Graphs , 2015, Proceedings of the IEEE.

[20]  Jacob Ratkiewicz,et al.  Political Polarization on Twitter , 2011, ICWSM.

[21]  Sarah Cohen,et al.  Computational journalism , 2011, Commun. ACM.

[22]  Barbara Poblete,et al.  Information credibility on twitter , 2011, WWW.

[23]  Soo Young Rieh,et al.  Credibility: A multidisciplinary framework , 2007, Annu. Rev. Inf. Sci. Technol..

[24]  Oren Etzioni,et al.  Open Information Extraction from the Web , 2007, CACM.

[25]  Emilio Hernández-García,et al.  Wikipedia Information Flow Analysis Reveals the Scale-Free Architecture of the Semantic Space , 2011, PloS one.

[26]  Justin Cheng,et al.  Rumor Cascades , 2014, ICWSM.

[27]  S. Dedeo Correction: Collective Phenomena and Non-Finite State Computation in a Human Social System , 2014, PLoS ONE.

[28]  References , 1971 .

[29]  Huajun Chen,et al.  The Semantic Web , 2011, Lecture Notes in Computer Science.

[30]  Jacob Ratkiewicz,et al.  Detecting and Tracking Political Abuse in Social Media , 2011, ICWSM.

[31]  Jon Kleinberg,et al.  The link prediction problem for social networks , 2003, CIKM '03.

[32]  Filippo Menczer,et al.  A scalable, collaborative similarity measure for social annotation systems , 2009, HT '09.

[33]  James A. Hendler,et al.  The Semantic Web" in Scientific American , 2001 .

[34]  Jon M. Kleinberg,et al.  The link-prediction problem for social networks , 2007, J. Assoc. Inf. Sci. Technol..

[35]  Rossano Schifanella,et al.  Friendship prediction and homophily in social media , 2012, TWEB.

[36]  Lourdes Araujo,et al.  Local-Based Semantic Navigation on a Networked Representation of Information , 2012, PloS one.

[37]  John Riedl,et al.  Creating, destroying, and restoring value in wikipedia , 2007, GROUP.

[38]  Jens Lehmann,et al.  DBpedia: A Nucleus for a Web of Open Data , 2007, ISWC/ASWC.

[39]  P. Ubel,et al.  The Hazards of Correcting Myths About Health Care Reform , 2013, Medical care.

[40]  Tom Fawcett,et al.  An introduction to ROC analysis , 2006, Pattern Recognit. Lett..

[41]  Luis Mateus Rocha,et al.  Distance closures on complex networks , 2013, Network Science.

[42]  Steven Luper,et al.  Epistemic Closure Principle , 2012 .

[43]  Joshua B. Tenenbaum,et al.  The Large-Scale Structure of Semantic Networks: Statistical Analyses and a Model of Semantic Growth , 2001, Cogn. Sci..

[44]  P. Resnick,et al.  RumorLens: A System for Analyzing the Impact of Rumors and Corrections in Social Media , 2014 .

[45]  Satoru Kawai,et al.  An Algorithm for Drawing General Undirected Graphs , 1989, Inf. Process. Lett..