Discriminative predicate path mining for fact checking in knowledge graphs

Traditional fact checking by experts and analysts cannot keep pace with the volume of newly created information. It is important and necessary, therefore, to enhance our ability to computationally determine whether some statement of fact is true or false. We view this problem as a link-prediction task in a knowledge graph, and present a discriminative path-based method for fact checking in knowledge graphs that incorporates connectivity, type information, and predicate interactions. Given a statement S of the form (subject, predicate, object), for example, (Chicago, capitalOf, Illinois), our approach mines discriminative paths that alternatively define the generalized statement (U.S. city, predicate, U.S. state) and uses the mined rules to evaluate the veracity of statement S . We evaluate our approach by examining thousands of claims related to history, geography, biology, and politics using a public, million node knowledge graph extracted from Wikipedia and PubMedDB. Not only does our approach significantly outperform related models, we also find that the discriminative predicate path model is easily interpretable and provides sensible reasons for the final determination.

[1]  Philip S. Yu,et al.  Integrating meta-path selection with user-guided object clustering in heterogeneous information networks , 2012, KDD.

[2]  William Tunstall-Pedoe,et al.  True Knowledge: Open-Domain Question Answering Using Structured Knowledge and Inference , 2010, AI Mag..

[3]  Zhen Wang,et al.  Knowledge Graph Embedding by Translating on Hyperplanes , 2014, AAAI.

[4]  Danqi Chen,et al.  Reasoning With Neural Tensor Networks for Knowledge Base Completion , 2013, NIPS.

[5]  Reynold Cheng,et al.  Discovering Meta-Paths in Large Heterogeneous Information Networks , 2015, WWW.

[6]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[7]  Bernard Grabot,et al.  Generating knowledge in maintenance from Experience Feedback , 2014, Knowl. Based Syst..

[8]  Oren Etzioni,et al.  Scaling question answering to the Web , 2001, WWW '01.

[9]  Sarah Cohen,et al.  Computational journalism , 2011, Commun. ACM.

[10]  Lada A. Adamic,et al.  Friends and neighbors on the Web , 2003, Soc. Networks.

[11]  Jennifer Widom,et al.  SimRank: a measure of structural-context similarity , 2002, KDD.

[12]  M. Levandowsky,et al.  Distance between Sets , 1971, Nature.

[13]  Jens Lehmann,et al.  DBpedia - A large-scale, multilingual knowledge base extracted from Wikipedia , 2015, Semantic Web.

[14]  John Mark Agosta,et al.  Highlighting disputed claims on the web , 2010, WWW '10.

[15]  Yong Yu,et al.  Prominent streak discovery in sequence data , 2011, KDD.

[16]  F. Ashcroft,et al.  VIII. References , 1955 .

[17]  Chengkai Li,et al.  Detecting Check-worthy Factual Claims in Presidential Debates , 2015, CIKM.

[18]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[19]  Johan Bollen,et al.  Computational Fact Checking from Knowledge Networks , 2015, PloS one.

[20]  Albert,et al.  Emergence of scaling in random networks , 1999, Science.

[21]  Ken-ichi Kawarabayashi,et al.  Scalable similarity search for SimRank , 2014, SIGMOD Conference.

[22]  David Liben-Nowell,et al.  The link-prediction problem for social networks , 2007 .

[23]  Pankaj K. Agarwal,et al.  Toward Computational Fact-Checking , 2014, Proc. VLDB Endow..

[25]  Fabian M. Suchanek,et al.  AMIE: association rule mining under incomplete evidence in ontological knowledge bases , 2013, WWW.

[26]  Clement T. Yu,et al.  T-verifier: Verifying truthfulness of fact statements , 2011, 2011 IEEE 27th International Conference on Data Engineering.

[27]  Michael Ley,et al.  DBLP - Some Lessons Learned , 2009, Proc. VLDB Endow..

[28]  Cong Yu,et al.  Data In, Fact Out: Automated Monitoring of Facts by FactWatcher , 2014, Proc. VLDB Endow..

[29]  Philip S. Yu,et al.  PathSim , 2011, Proc. VLDB Endow..

[30]  Nitesh V. Chawla,et al.  Vertex collocation profiles: subgraph counting for link analysis and prediction , 2012, WWW.

[31]  Ni Lao,et al.  Relational retrieval using a combination of path-constrained random walks , 2010, Machine Learning.

[32]  Halil Kilicoglu,et al.  SemMedDB: a PubMed-scale repository of biomedical semantic predications , 2012, Bioinform..

[33]  Oren Etzioni,et al.  Scaling question answering to the Web , 2001, WWW '01.

[34]  Jason Weston,et al.  Translating Embeddings for Modeling Multi-relational Data , 2013, NIPS.

[35]  Zhiyuan Liu,et al.  Learning Entity and Relation Embeddings for Knowledge Graph Completion , 2015, AAAI.

[36]  Yong Zhang,et al.  Graph-based approaches to debugging and revision of terminologies in DL-Lite , 2016, Knowl. Based Syst..

[37]  Felix Naumann,et al.  Synonym Analysis for Predicate Expansion , 2013, ESWC.

[38]  Tommy W. S. Chow,et al.  Automatic image annotation via compact graph based semi-supervised learning , 2015, Knowl. Based Syst..

[39]  Nitesh V. Chawla,et al.  CoupledLP: Link Prediction in Coupled Networks , 2015, KDD.

[40]  Taher H. Haveliwala Topic-sensitive PageRank , 2002, IEEE Trans. Knowl. Data Eng..

[41]  Hans-Peter Kriegel,et al.  A Three-Way Model for Collective Learning on Multi-Relational Data , 2011, ICML.

[42]  Pankaj K. Agarwal,et al.  On "one of the few" objects , 2012, KDD.

[43]  John Miller,et al.  Traversing Knowledge Graphs in Vector Space , 2015, EMNLP.

[44]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[45]  Leo Katz,et al.  A new status index derived from sociometric analysis , 1953 .

[46]  Evgeniy Gabrilovich,et al.  A Review of Relational Machine Learning for Knowledge Graphs , 2015, Proceedings of the IEEE.

[47]  Philip S. Yu,et al.  HeteSim: A General Framework for Relevance Measure in Heterogeneous Networks , 2013, IEEE Transactions on Knowledge and Data Engineering.

[48]  Marcelo Fiszman,et al.  The interaction of domain knowledge and linguistic structure in natural language processing: interpreting hypernymic propositions in biomedical text , 2003, J. Biomed. Informatics.

[49]  Doug Downey,et al.  Web-scale information extraction in knowitall: (preliminary results) , 2004, WWW '04.