LAUREN - Knowledge Graph Summarization for Question Answering

Besides the challenge that a human can ask one question in many different ways, a key aspect in Question Answering approaches over Knowledge Graphs (KGQA) is to deal with the vast amount of information present in the knowledge graphs. Modern real-world knowledge graphs contain nearly millions of entities and relationships. Additionally, they are enriched with new facts every day. However, not all facts are relevant for answering particular questions, thus fostering several challenges to KGQA systems, which require interpretable and query-able data. One solution to filtering the extra data in knowledge graphs is to rely on graph summarization techniques. Graph-based summarization approaches aim to resize knowledge graphs to be more concise and precise by storing only relevant information. In this paper, we propose a framework named LAUREN that applies different summarization techniques on knowledge graphs to be used in KGQA systems. Our experiments show that LAUREN summarizes large knowledge graphs such as DBpedia by 2 million entities and its summarization still achieves the same performance on both question answering and linking tasks compared to the complete DBpedia.

[1]  Raphaël Troncy,et al.  GERBIL: General Entity Annotator Benchmarking Framework , 2015, WWW.

[2]  Danai Koutra,et al.  Personalized Knowledge Graph Summarization: From the Cloud to Your Pocket , 2019, 2019 IEEE International Conference on Data Mining (ICDM).

[3]  Doug Downey,et al.  Local and Global Algorithms for Disambiguation to Wikipedia , 2011, ACL.

[4]  Christian Bizer,et al.  DBpedia spotlight: shedding light on the web of documents , 2011, I-Semantics '11.

[5]  Sebastian Hellmann,et al.  N³ - A Collection of Datasets for Named Entity Recognition and Disambiguation in the NLP Interchange Format , 2014, LREC.

[6]  Danai Koutra,et al.  TimeCrunch: Interpretable Dynamic Graph Summarization , 2015, KDD.

[7]  Axel-Cyrille Ngonga Ngomo,et al.  Entity Linking in 40 Languages using MAG , 2018, ESWC.

[8]  Jens Lehmann,et al.  DBpedia - A large-scale, multilingual knowledge base extracted from Wikipedia , 2015, Semantic Web.

[9]  Silviu Cucerzan,et al.  Large-Scale Named Entity Disambiguation Based on Wikipedia Data , 2007, EMNLP.

[10]  Jens Lehmann,et al.  Old is Gold: Linguistic Driven Approach for Entity and Relation Linking of Short Text , 2019, NAACL.

[11]  Felix Conrads,et al.  8th Challenge on Question Answering over Linked Data (QALD-8) (invited paper) , 2018, Semdeep/NLIWoD@ISWC.

[12]  Tina Eliassi-Rad,et al.  Visual Analysis of Large Heterogeneous Social Networks by Semantic and Structural Abstraction , 2006 .

[13]  Muhammad Saleem,et al.  9th Challenge on Question Answering over Linked Data (QALD-9) (invited paper) , 2018, Semdeep/NLIWoD@ISWC.

[14]  Ian H. Witten,et al.  Learning to link with wikipedia , 2008, CIKM '08.

[15]  Ben Shneiderman,et al.  Motif simplification: improving network visualization readability with fan, connector, and clique glyphs , 2013, CHI.

[16]  Danai Koutra,et al.  Graph Summarization Methods and Applications: A Survey , 2016 .

[17]  Young-Koo Lee,et al.  An effective graph summarization and compression technique for a large-scaled graph , 2018, The Journal of Supercomputing.

[18]  Felix Conrads,et al.  Benchmarking question answering systems , 2019, Semantic Web.

[19]  Kuldeep Singh,et al.  Qanary - A Methodology for Vocabulary-Driven Open Question Answering Systems , 2016, ESWC.

[20]  Shlomo Moran,et al.  The stochastic approach for link-structure analysis (SALSA) and the TKC effect , 2000, Comput. Networks.