Improving Language-Dependent Named Entity Detection

Named Entity Recognition (NER) and Named Entity Linking (NEL) are two research areas that have shown big advancements in recent years. The majority of this research is based on the English language. Hence, some of these improvements are language-dependent and do not necessarily lead to better results when applied to other languages. Therefore, this paper discusses TOMO, an approach to language-aware named entity detection and evaluates it for the German language. This also required the development of a German gold standard dataset, which was based on the English dataset used by the OKE 2016 challenge. An evaluation of the named entity detection task using the web-based platform GERBIL was undertaken and results show that our approach produced higher F1 values than the other annotators did. This indicates that language-dependent features do improve the overall quality of the spotter.

[1]  Xianpei Han,et al.  An Entity-Topic Model for Entity Linking , 2012, EMNLP.

[2]  Peter Adolphs,et al.  The neofonie NERD system at the ERD challenge 2014 , 2014, ERD '14.

[3]  Pablo N. Mendes,et al.  Improving efficiency and accuracy in multilingual entity extraction , 2013, I-SEMANTICS '13.

[4]  Andreas Holzinger,et al.  Introduction to MAchine Learning & Knowledge Extraction (MAKE) , 2017, Mach. Learn. Knowl. Extr..

[5]  Silviu Cucerzan,et al.  Acronym-Expansion Recognition and Ranking on the Web , 2007, 2007 IEEE International Conference on Information Reuse and Integration.

[6]  Erdogan Dogdu,et al.  Named entity recognition and disambiguation using linked data and graph-based centrality scoring , 2012, SWIM '12.

[7]  Ying Shi,et al.  LCC Approaches to Knowledge Base Population at TAC 2010 , 2010, TAC.

[8]  Raphaël Troncy,et al.  NERD: evaluating named entity recognition tools in the web of data , 2011 .

[9]  Raphaël Troncy,et al.  GERBIL: General Entity Annotator Benchmarking Framework , 2015, WWW.

[10]  Raphaël Troncy,et al.  NERD meets NIF: Lifting NLP Extraction Results to the Linked Data Cloud , 2012, LDOW.

[11]  Eneko Agirre,et al.  Personalizing PageRank for Word Sense Disambiguation , 2009, EACL.

[12]  Achim Rettinger,et al.  Towards Entity Correctness, Completeness and Emergence for Entity Recognition , 2015, WWW.

[13]  Diego Reforgiato Recupero,et al.  Semantic Web Machine Reading with FRED , 2017, Semantic Web.

[14]  Milan Dojchinovski,et al.  Entityclassifier.eu: Real-Time Classification of Entities in Text with Wikipedia , 2013, ECML/PKDD.

[15]  Marieke van Erp,et al.  Lessons learnt from the Named Entity rEcognition and Linking (NEEL) challenge series , 2017, Semantic Web.

[16]  Mark Dredze,et al.  Entity Disambiguation for Knowledge Base Population , 2010, COLING.

[17]  Juraj Hresko,et al.  Entity linking based on the co-occurrence graph and entity probability , 2014, ERD '14.

[18]  Paolo Ferragina,et al.  TAGME: on-the-fly annotation of short text fragments (by wikipedia entities) , 2010, CIKM.

[19]  Joel Nothman,et al.  Evaluating Entity Linking with Wikipedia , 2013, Artif. Intell..

[20]  Evangelos E. Milios,et al.  Tulip: lightweight entity recognition and disambiguation using wikipedia-based topic centroids , 2014, ERD '14.

[21]  Salvatore Orlando,et al.  Dexter: an open source framework for entity linking , 2013, ESAIR '13.

[22]  Silviu Cucerzan,et al.  Large-Scale Named Entity Disambiguation Based on Wikipedia Data , 2007, EMNLP.

[23]  Ian H. Witten,et al.  Learning to link with wikipedia , 2008, CIKM '08.

[24]  Graciela Gonzalez,et al.  BANNER: An Executable Survey of Advances in Biomedical Named Entity Recognition , 2007, Pacific Symposium on Biocomputing.

[25]  Maarten Marx,et al.  Entity linking by focusing DBpedia candidate entities , 2014, ERD '14.

[26]  Hsin-Hsi Chen,et al.  NTUNLP approaches to recognizing and disambiguating entities in long and short text at the ERD challenge 2014 , 2014, ERD '14.

[27]  Roberto Navigli,et al.  Entity Linking meets Word Sense Disambiguation: a Unified Approach , 2014, TACL.

[28]  Michel Gagnon,et al.  Automatic Semantic Web Annotation of Named Entities , 2011, Canadian Conference on AI.

[29]  Doug Downey,et al.  Local and Global Algorithms for Disambiguation to Wikipedia , 2011, ACL.

[30]  Christian Bizer,et al.  DBpedia spotlight: shedding light on the web of documents , 2011, I-Semantics '11.

[31]  Eneko Agirre,et al.  Random Walks for Knowledge-Based Word Sense Disambiguation , 2014, CL.

[32]  Vangelis Karkaletsis,et al.  Argument Extraction from News, Blogs, and Social Media , 2014, SETN.

[33]  Tomás Kliegr,et al.  Linked hypernyms: Enriching DBpedia with Targeted Hypernym Discovery , 2015, J. Web Semant..

[34]  Claudio Giuliano,et al.  Wikipedia-based WSD for multilingual frame annotation , 2013, Artif. Intell..

[35]  Christian Bizer,et al.  DBpedia: A Multilingual Cross-domain Knowledge Base , 2012, LREC.

[36]  Günter Neumann,et al.  An Information Extraction Core System for Real World German Text Processing , 1997, ANLP.

[37]  Xianpei Han,et al.  NLPR_KBP in TAC 2009 KBP Track: A Two-Stage Method to Entity Linking , 2009, TAC.

[38]  Paolo Ferragina,et al.  From TagME to WAT: a new entity annotator , 2014, ERD '14.

[39]  Oren Etzioni,et al.  Named Entity Recognition in Tweets: An Experimental Study , 2011, EMNLP.

[40]  Rada Mihalcea,et al.  Wikify!: linking documents to encyclopedic knowledge , 2007, CIKM '07.

[41]  Xianpei Han,et al.  A Generative Entity-Mention Model for Linking Entities with Knowledge Base , 2011, ACL.

[42]  Dianfu Ma,et al.  Combining POS Tagging, Lucene Search and Similarity Metrics for Entity Linking , 2013, WISE.

[43]  Salvatore Orlando,et al.  Dexter 2.0 - an Open Source Tool for Semantically Enriching Data , 2014, International Semantic Web Conference.

[44]  Andrea Giovanni Nuzzolese,et al.  Open Knowledge Extraction Challenge , 2015, SemWebEval@ESWC.

[45]  Eneko Agirre,et al.  UBC entity recognition and disambiguation at ERD 2014 , 2014, ERD '14.

[46]  Raphaël Troncy,et al.  Benchmarking the Extraction and Disambiguation of Named Entities on the Semantic Web , 2014, LREC.

[47]  Axel-Cyrille Ngonga Ngomo,et al.  Ensemble Learning for Named Entity Recognition , 2014, SEMWEB.

[48]  Raphaël Troncy,et al.  Analysis of named entity recognition and linking for tweets , 2014, Inf. Process. Manag..

[49]  Ian H. Witten,et al.  An open-source toolkit for mining Wikipedia , 2013, Artif. Intell..

[50]  Gerhard Weikum,et al.  Robust Disambiguation of Named Entities in Text , 2011, EMNLP.

[51]  Doug Downey,et al.  WebSAIL wikifier at ERD 2014 , 2014, ERD '14.

[52]  Nikos Tsirakis,et al.  Sentiment Analysis for Reputation Management: Mining the Greek Web , 2014, SETN.

[53]  Sean Monahan,et al.  Cross-Lingual Cross-Document Coreference with Entity Linking , 2011, TAC.

[54]  Bob Carpenter,et al.  Phrasal Queries with LingPipe and Lucene: Ad Hoc Genomics Text Retrieval , 2004, TREC.