BENGAL: An Automatic Benchmark Generator for Entity Recognition and Linking

The manual creation of gold standards for named entity recognition and entity linking is time- and resource-intensive. Moreover, recent works show that such gold standards contain a large proportion of mistakes in addition to being difficult to maintain. We hence present BENGAL, a novel automatic generation of such gold standards as a complement to manually created benchmarks. The main advantage of our benchmarks is that they can be readily generated at any time. They are also cost-effective while being guaranteed to be free of annotation errors. We compare the performance of 11 tools on benchmarks in English generated by BENGAL and on 16benchmarks created manually. We show that our approach can be ported easily across languages by presenting results achieved by 4 tools on both Brazilian Portuguese and Spanish. Overall, our results suggest that our automatic benchmark generation approach can create varied benchmarks that have characteristics similar to those of existing benchmarks. Our approach is open-source. Our experimental results are available at this http URL and the code at this https URL.

[1]  Mariana L. Neves,et al.  RDF2PT: Generating Brazilian Portuguese Texts from RDF Data , 2018, LREC.

[2]  Xavier Serra,et al.  ELMD: An Automatically Generated Entity Linking Gold Standard Dataset in the Music Domain , 2016, LREC.

[3]  Roberto Navigli,et al.  Multilingual Word Sense Disambiguation and Entity Linking for Everybody , 2014, International Semantic Web Conference.

[4]  Andreas Harth,et al.  A language-independent method for the extraction of RDF verbalization templates , 2014, INLG.

[5]  Thomas Hofmann,et al.  Probabilistic Bag-Of-Hyperlinks Model for Entity Linking , 2015, WWW.

[6]  Ion Androutsopoulos,et al.  A Survey of Paraphrasing and Textual Entailment Methods , 2009, J. Artif. Intell. Res..

[7]  Axel-Cyrille Ngonga Ngomo,et al.  All that Glitters Is Not Gold - Rule-Based Curation of Reference Datasets for Named Entity Recognition and Entity Linking , 2017, ESWC.

[8]  Shashi Narayan,et al.  Creating Training Corpora for NLG Micro-Planners , 2017, ACL.

[9]  Silviu Cucerzan,et al.  Large-Scale Named Entity Disambiguation Based on Wikipedia Data , 2007, EMNLP.

[10]  Kathleen McKeown,et al.  Discourse Planning with an N-gram Model of Relations , 2015, EMNLP.

[11]  Harald Sack,et al.  Statistical Analyses of Named Entity Disambiguation Benchmarks , 2013, NLP-DBPEDIA@ISWC.

[12]  Dietrich Rebholz-Schuhmann,et al.  Calbc Silver Standard Corpus , 2010, J. Bioinform. Comput. Biol..

[13]  Rolf Schwitter,et al.  Controlled Natural Languages meets the Semantic Web , 2004 .

[14]  Raphaël Troncy,et al.  GERBIL: General Entity Annotator Benchmarking Framework , 2015, WWW.

[15]  Claire Gardent,et al.  Building RDF Content for Data-to-Text Generation , 2016, COLING.

[16]  Axel-Cyrille Ngonga Ngomo,et al.  BioASQ: A Challenge on Large-Scale Biomedical Semantic Indexing and Question Answering , 2012, AAAI Fall Symposium: Information Retrieval and Knowledge Discovery in Biomedical Text.

[17]  Elena Paslaru Bontas Simperl,et al.  Labels in the Web of Data , 2011, SEMWEB.

[18]  Aidan Hogan,et al.  VoxEL: A Benchmark Dataset for Multilingual Entity Linking , 2018, SEMWEB.

[19]  Emiel Krahmer,et al.  NeuralREG: An end-to-end approach to referring expression generation , 2018, ACL.

[20]  Philipp Cimiano,et al.  Exploiting Ontology Lexica for Generating Natural Language Texts from RDF Data , 2013, ENLG.

[21]  Will Fitzgerald,et al.  A Hybrid Model for Annotating Named Entity Training Corpora , 2010, Linguistic Annotation Workshop.

[22]  Andrea Giovanni Nuzzolese,et al.  Open Knowledge Extraction Challenge , 2015, SemWebEval@ESWC.

[23]  Kamenka Staykova,et al.  Natural Language Generation and Semantic Technologies , 2014 .

[24]  Doug Downey,et al.  Local and Global Algorithms for Disambiguation to Wikipedia , 2011, ACL.

[25]  Christian Bizer,et al.  DBpedia spotlight: shedding light on the web of documents , 2011, I-Semantics '11.

[26]  Sebastian Hellmann,et al.  N³ - A Collection of Datasets for Named Entity Recognition and Disambiguation in the NLP Interchange Format , 2014, LREC.

[27]  Claire Gardent,et al.  Generating Paraphrases from DBPedia using Deep Learning , 2016, WebNLG.

[28]  Heiko Paulheim,et al.  Evaluating Entity Linking: An Analysis of Current Benchmark Datasets and a Roadmap for Doing a Better Job , 2016, LREC.

[29]  Aba-Sah Dadzie,et al.  Making Sense of Microposts (#Microposts2014) Named Entity Extraction & Linking Challenge , 2014, #MSM.

[30]  Halil Kilicoglu,et al.  Aligning Texts and Knowledge Bases with Semantic Sentence Simplification , 2016, WebNLG.

[31]  Axel-Cyrille Ngonga Ngomo,et al.  MAG: A Multilingual, Knowledge-base Agnostic and Deterministic Entity Linking Approach , 2017, K-CAP.

[32]  Daniel Duma,et al.  Generating Natural Language from Linked Data: Unsupervised template extraction , 2013, IWCS.

[33]  Claire Gardent,et al.  Category-Driven Content Selection , 2016, INLG.

[34]  Axel-Cyrille Ngonga Ngomo,et al.  BorderFlow: A Local Graph Clustering Algorithm for Natural Language Processing , 2009, CICLing.

[35]  Chris Mellish,et al.  Domain Independent Sentence Generation from RDF Representations for the Semantic Web , 2006 .

[36]  Erik F. Tjong Kim Sang,et al.  Introduction to the CoNLL-2003 Shared Task: Language-Independent Named Entity Recognition , 2003, CoNLL.

[37]  Claire Gardent,et al.  The WebNLG Challenge: Generating Text from DBPedia Data , 2016, INLG.

[38]  Leo Wanner,et al.  Natural Language Generation in the context of the Semantic Web , 2014, Semantic Web.

[39]  Hugo Gonçalo Oliveira,et al.  Second HAREM: Advancing the State of the Art of Named Entity Recognition in Portuguese , 2010, LREC.

[40]  Alberto Bugarín,et al.  Adapting SimpleNLG to Spanish , 2017, INLG.

[41]  Jens Lehmann,et al.  Sorry, i don't speak SPARQL: translating SPARQL queries into natural language , 2013, WWW.

[42]  C. Halaschek-Wiener,et al.  Effective NL Paraphrasing of Ontologies on the Semantic Web , 2005 .

[43]  Ian H. Witten,et al.  Learning to link with wikipedia , 2008, CIKM '08.