Fact Checking in Large Knowledge Graphs - A Discriminative Predicate Path Mining Approach

Traditional fact checking by experts and analysts cannot keep pace with the volume of newly created information. It is important and necessary, therefore, to enhance our ability to computationally determine whether some statement of fact is true or false. We view this problem as a link-prediction task in a knowledge graph, and show that a new model of the top discriminative predicate paths is able to understand the meaning of some statement and accurately determine its veracity. We evaluate our approach by examining thousands of claims related to history, geography, biology, and politics using a public, million node knowledge graph extracted from Wikipedia and PubMedDB. Not only does our approach significantly outperform related models, we also find that the discriminative predicate path model is easily interpretable and provides sensible reasons for the final determination.

[1]  Pankaj K. Agarwal,et al.  On "one of the few" objects , 2012, KDD.

[2]  Philip S. Yu,et al.  Integrating meta-path selection with user-guided object clustering in heterogeneous information networks , 2012, KDD.

[3]  John Miller,et al.  Traversing Knowledge Graphs in Vector Space , 2015, EMNLP.

[4]  Reynold Cheng,et al.  Discovering Meta-Paths in Large Heterogeneous Information Networks , 2015, WWW.

[5]  Oren Etzioni,et al.  Scaling question answering to the Web , 2001, WWW '01.

[6]  David Liben-Nowell,et al.  The link-prediction problem for social networks , 2007 .

[7]  Halil Kilicoglu,et al.  SemMedDB: a PubMed-scale repository of biomedical semantic predications , 2012, Bioinform..

[8]  William Tunstall-Pedoe,et al.  True Knowledge: Open-Domain Question Answering Using Structured Knowledge and Inference , 2010, AI Mag..

[9]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[10]  Jennifer Widom,et al.  SimRank: a measure of structural-context similarity , 2002, KDD.

[11]  Felix Naumann,et al.  Synonym Analysis for Predicate Expansion , 2013, ESWC.

[12]  Michael Ley,et al.  DBLP - Some Lessons Learned , 2009, Proc. VLDB Endow..

[13]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[14]  Leo Katz,et al.  A new status index derived from sociometric analysis , 1953 .

[15]  Philip S. Yu,et al.  HeteSim: A General Framework for Relevance Measure in Heterogeneous Networks , 2013, IEEE Transactions on Knowledge and Data Engineering.

[16]  Evgeniy Gabrilovich,et al.  A Review of Relational Machine Learning for Knowledge Graphs , 2015, Proceedings of the IEEE.

[17]  Jens Lehmann,et al.  DBpedia - A large-scale, multilingual knowledge base extracted from Wikipedia , 2015, Semantic Web.

[18]  Albert,et al.  Emergence of scaling in random networks , 1999, Science.

[19]  Yong Yu,et al.  Prominent streak discovery in sequence data , 2011, KDD.

[20]  Johan Bollen,et al.  Computational Fact Checking from Knowledge Networks , 2015, PloS one.

[21]  Cong Yu,et al.  Data In, Fact Out: Automated Monitoring of Facts by FactWatcher , 2014, Proc. VLDB Endow..

[22]  Nitesh V. Chawla,et al.  Vertex collocation profiles: subgraph counting for link analysis and prediction , 2012, WWW.

[23]  Ni Lao,et al.  Relational retrieval using a combination of path-constrained random walks , 2010, Machine Learning.

[24]  Ken-ichi Kawarabayashi,et al.  Scalable similarity search for SimRank , 2014, SIGMOD Conference.

[25]  Pankaj K. Agarwal,et al.  Toward Computational Fact-Checking , 2014, Proc. VLDB Endow..

[26]  Fabian M. Suchanek,et al.  AMIE: association rule mining under incomplete evidence in ontological knowledge bases , 2013, WWW.

[27]  Lada A. Adamic,et al.  Friends and neighbors on the Web , 2003, Soc. Networks.

[28]  John Mark Agosta,et al.  Highlighting disputed claims on the web , 2010, WWW '10.

[29]  Marcelo Fiszman,et al.  The interaction of domain knowledge and linguistic structure in natural language processing: interpreting hypernymic propositions in biomedical text , 2003, J. Biomed. Informatics.

[30]  Sarah Cohen,et al.  Computational journalism , 2011, Commun. ACM.

[31]  Doug Downey,et al.  Web-scale information extraction in knowitall: (preliminary results) , 2004, WWW '04.

[32]  Philip S. Yu,et al.  PathSim , 2011, Proc. VLDB Endow..

[33]  Kevin P. Murphy,et al.  Machine learning - a probabilistic perspective , 2012, Adaptive computation and machine learning series.

[34]  Chengkai Li,et al.  Detecting Check-worthy Factual Claims in Presidential Debates , 2015, CIKM.

[35]  Nitesh V. Chawla,et al.  CoupledLP: Link Prediction in Coupled Networks , 2015, KDD.

[36]  Taher H. Haveliwala Topic-sensitive PageRank , 2002, IEEE Trans. Knowl. Data Eng..