Characterizing mention mismatching problems for improving recognition results

Mentions to real world things which are recognized by software tools in text often mismatch the ground truth. This paper proposes a formal classification of mention mismatching problems, including partial matching. Then, it depicts evidence that some longer mentions are associated with higher precision and more specific things than shorter mentions that overlap them. Based on this, some algorithms are proposed to automatically improve mentions by increasing their sizes whenever and as much as possible. Experimental results applying a variety of state-of-the-art annotation tools against several datasets made from real world texts show that over-segmentation (returned mention contained in the corresponding one of the ground truth) is the most prevalent partial matching problem among those of the proposed classification. In addition, some of the proposed algorithms for mention enhancing were able to correct most over-segmented mentions returned by tools used in the experiments with prominent benchmarks, leading to gains in precision and recall.

[1]  Massimiliano Ciaramita,et al.  A framework for benchmarking entity-annotation systems , 2013, WWW.

[2]  Nathan Schneider,et al.  Association for Computational Linguistics: Human Language Technologies , 2011 .

[3]  Dan Roth,et al.  Design Challenges and Misconceptions in Named Entity Recognition , 2009, CoNLL.

[4]  Rada Mihalcea,et al.  Wikify!: linking documents to encyclopedic knowledge , 2007, CIKM '07.

[5]  Avirup Sil,et al.  Re-ranking for joint named-entity recognition and linking , 2013, CIKM.

[6]  Axel-Cyrille Ngonga Ngomo,et al.  Ensemble Learning for Named Entity Recognition , 2014, SEMWEB.

[7]  Rainer Alt,et al.  Towards an Ontology-Based Approach for Social Media Analysis , 2014, ECIS.

[8]  Axel-Cyrille Ngonga Ngomo,et al.  All that Glitters Is Not Gold - Rule-Based Curation of Reference Datasets for Named Entity Recognition and Entity Linking , 2017, ESWC.

[9]  Raphaël Troncy,et al.  A Hybrid Approach for Entity Recognition and Linking , 2015, SemWebEval@ESWC.

[10]  Zaiqing Nie,et al.  Joint Entity Recognition and Disambiguation , 2015, EMNLP.

[11]  Gang Luo,et al.  Joint Named Entity Recognition and Disambiguation , 2015 .

[12]  Hsin-Hsi Chen,et al.  NTUNLP approaches to recognizing and disambiguating entities in long and short text at the ERD challenge 2014 , 2014, ERD '14.

[13]  Pablo Gamallo,et al.  A Resource-Based Method for Named Entity Extraction and Classification , 2011, EPIA.

[14]  Axel-Cyrille Ngonga Ngomo,et al.  GERBIL - Benchmarking Named Entity Recognition and Linking consistently , 2017, Semantic Web.

[15]  Guoliang Li,et al.  A unified framework for approximate dictionary-based entity extraction , 2014, The VLDB Journal.

[16]  Roberto Navigli,et al.  Entity Linking meets Word Sense Disambiguation: a Unified Approach , 2014, TACL.

[17]  Doug Downey,et al.  Local and Global Algorithms for Disambiguation to Wikipedia , 2011, ACL.

[18]  Christian Bizer,et al.  DBpedia spotlight: shedding light on the web of documents , 2011, I-Semantics '11.

[19]  Guoliang Li,et al.  Extending dictionary-based entity extraction to tolerate errors , 2010, CIKM '10.

[20]  Diego Reforgiato Recupero,et al.  Semantic Web Machine Reading with FRED , 2017, Semantic Web.

[21]  Sören Auer,et al.  AGDISTIS - Graph-Based Disambiguation of Named Entities Using Linked Data , 2014, International Semantic Web Conference.

[22]  Gerhard Weikum,et al.  J-NERD: Joint Named Entity Recognition and Disambiguation with Rich Linguistic Features , 2016, TACL.