An explainable Link Discovery: Multilingual Link Specification verbalization and summarization

Abstract The number and size of datasets abiding by the Linked Data paradigm increase every day. Discovering links between these datasets is thus central to achieving the vision behind the Data Web. Declarative Link Discovery (LD) frameworks rely on complex Link Specification (LS) to express the conditions under which two resources should be linked. Understanding such LS is not a trivial task for non-expert users. Particularly when such users are interested in generating LS to match their needs. Even if the user applies a machine learning algorithm for the automatic generation of the required LS, the challenge of explaining the resultant LS persists. Hence, providing explainable LS is the key challenge to enable users who are unfamiliar with underlying LS technologies to use them effectively and efficiently. In this paper, we extend our previous work (Ahmed et al., 2019) by proposing a generic multilingual approach that allows verbalization of LS in many languages, i.e., converts LS into understandable natural language text. In this work, we ported our LS verbalization framework into German and Spanish, in addition to English language. Our adequacy and fluency evaluations show that our approach can generate complete and easily understandable natural language descriptions even by lay users. Moreover, we devised an experimental neural approach for improving the quality of our generated texts. Our neural approach achieves promising results in terms of BLEU, METEOR and chrF++.

[1]  Eduard H. Hovy,et al.  Aggregation in Natural Language Generation , 1993, EWNLG.

[2]  Jens Lehmann,et al.  RAVEN - active learning of link specifications , 2011, OM.

[3]  Axel-Cyrille Ngonga Ngomo,et al.  EAGLE: Efficient Active Learning of Link Specifications Using Genetic Programming , 2012, ESWC.

[4]  Krys J. Kochut,et al.  Text Summarization Techniques: A Brief Survey , 2017, International Journal of Advanced Computer Science and Applications.

[5]  George D. C. Cavalcanti,et al.  Assessing sentence scoring techniques for extractive text summarization , 2013, Expert Syst. Appl..

[6]  Sören Auer,et al.  LIMES - A Time-Efficient Approach for Large-Scale Link Discovery on the Web of Data , 2011, IJCAI.

[7]  Kristiina Jokinen,et al.  Generating Responses and Explanations from RDF/XML and DAML+OIL , 2003 .

[8]  Axel-Cyrille Ngonga Ngomo,et al.  LSVS: Link Specification Verbalization and Summarization , 2019, NLDB.

[9]  Norbert E. Fuchs First-Order Reasoning for Attempto Controlled English , 2010, CNL.

[10]  Axel-Cyrille Ngonga Ngomo,et al.  Extracting Multilingual Natural-Language Patterns for RDF Predicates , 2012, EKAW.

[11]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[12]  George R. Doddington,et al.  Automatic Evaluation of Machine Translation Quality Using N-gram Co-Occurrence Statistics , 2002 .

[13]  Chris Mellish,et al.  The semantic web as a Linguistic resource: Opportunities for natural language generation , 2005, Knowl. Based Syst..

[14]  Paul Buitelaar,et al.  Utilizing Knowledge Graphs for Neural Machine Translation Augmentation , 2019, K-CAP.

[15]  Christopher D. Manning,et al.  Effective Approaches to Attention-based Neural Machine Translation , 2015, EMNLP.

[16]  Christophe Gravier,et al.  Neural Wikipedian: Generating Textual Summaries from Knowledge Base Triples , 2017, J. Web Semant..

[17]  Jens Lehmann,et al.  DeFacto - Deep Fact Validation , 2012, SEMWEB.

[18]  Marcos André Gonçalves,et al.  Replica identification using genetic programming , 2008, SAC '08.

[19]  Florian Matthes,et al.  SimpleNLG-DE: Adapting SimpleNLG 4 to German , 2019, INLG.

[20]  Aleksander Pohl The polish interface for linked open data , 2010 .

[21]  Alberto Bugarín,et al.  Adapting SimpleNLG to Spanish , 2017, INLG.

[22]  Robert Isele,et al.  Efficient Multidimensional Blocking for Link Discovery without losing Recall , 2011, WebDB.

[23]  Ehud Reiter,et al.  Book Reviews: Building Natural Language Generation Systems , 2000, CL.

[24]  Andreas Thor,et al.  Comparative evaluation of entity resolution approaches with FEVER , 2009, Proc. VLDB Endow..

[25]  Jens Lehmann,et al.  Wombat - A Generalization Approach for Automatic Link Discovery , 2017, ESWC.

[26]  Rico Sennrich,et al.  Neural Machine Translation of Rare Words with Subword Units , 2015, ACL.