The First Cross-Lingual Challenge on Recognition, Normalization, and Matching of Named Entities in Slavic Languages

This paper describes the outcomes of the first challenge on multilingual named entity recognition that aimed at recognizing mentions of named entities in web documents in Slavic languages, their normalization/lemmatization, and cross-language matching. It was organised in the context of the 6th Balto-Slavic Natural Language Processing Workshop, co-located with the EACL 2017 conference. Although eleven teams signed up for the evaluation, due to the complexity of the task(s) and short time available for elaborating a solution, only two teams submitted results on time. The reported evaluation figures reflect the relatively higher level of complexity of named entity-related tasks in the context of processing texts in Slavic languages. Since the duration of the challenge goes beyond the date of the publication of this paper and updated picture of the participating systems and their corresponding performance can be found on the web page of the challenge.

[1]  Haizhou Li,et al.  Report of NEWS 2016 Machine Transliteration Shared Task , 2016, NEWS@ACM.

[2]  Jan Snajder,et al.  CroNER: Recognizing Named Entities in Croatian Using Conditional Random Fields , 2013, Informatica.

[3]  Nancy Chinchor,et al.  Overview of MUC-7 , 1998, MUC.

[4]  Erik F. Tjong Kim Sang,et al.  Introduction to the CoNLL-2002 Shared Task: Language-Independent Named Entity Recognition , 2002, CoNLL.

[5]  Haizhou Li,et al.  Report of NEWS 2010 Transliteration Mining Shared Task , 2010, NEWS@ACL.

[6]  Svetlana Alexeeva,et al.  FactRuEval 2016: Evaluation of Named Entity Recognition and Fact Extraction Systems for Russian , 2016 .

[7]  Jan Šnajder,et al.  Tagging Named Entities in Croatian Tweets , 2017 .

[8]  Michael Strube,et al.  Which Coreference Evaluation Metric Do You Trust? A Proposal for a Link-based Entity Aware Metric , 2016, ACL.

[9]  James Mayfield,et al.  Language-Independent Named Entity Analysis Using Parallel Projection and Rule-Based Disambiguation , 2017, BSNLP@EACL.

[10]  Ben Hachey,et al.  Overview of TAC-KBP2014 Entity Discovery and Linking Tasks , 2015 .

[11]  Mark A. Przybocki,et al.  The Automatic Content Extraction (ACE) Program – Tasks, Data, and Evaluation , 2004, LREC.

[12]  Zdenek Zabokrtský,et al.  Named Entities in Czech: Annotating Data and Developing NE Tagger , 2007, TSD.

[13]  Adam Przepiórkowski,et al.  Tools and methodologies for annotating syntax and named entities in the National Corpus of Polish , 2010, Proceedings of the International Multiconference on Computer Science and Information Technology.

[14]  Adam Przepiórkowski,et al.  Slavic Information Extraction and Partial Parsing , 2007, ACL 2007.

[15]  Michal Konkol,et al.  CRF-Based Czech Named Entity Recognizer and Consolidation of Czech NER Research , 2013, TSD.

[16]  Gerhard Wagner,et al.  Desktop text mining for law enforcement , 2010, 2010 IEEE International Conference on Intelligence and Security Informatics.

[17]  Simon Krek,et al.  Named entity recognition in Slovene text , 2013 .

[18]  Adam Radziszewski Learning to lemmatise Polish noun phrases , 2013, ACL.

[19]  Christine D. Piatko,et al.  Lattice-based tagging using support vector machines , 2003, CIKM '03.

[20]  Nikola Ljubesic,et al.  The SETimes.HR Linguistically Annotated Corpus of Croatian , 2014, LREC.

[21]  Dan Roth,et al.  Design Challenges and Misconceptions in Named Entity Recognition , 2009, CoNLL.

[22]  Jakub Piskorski,et al.  Language Resources for Named Entity Annotation in the National Corpus of Polish , 2010 .

[23]  Maciej Janicki,et al.  Liner2 - A Customizable Framework for Proper Names Recognition for Polish , 2013, Intelligent Tools for Building a Scientific Information Platform.

[24]  Heng Ji,et al.  Overview of TAC-KBP2015 Tri-lingual Entity Discovery and Linking , 2015, TAC.

[25]  Marcin Oleksy,et al.  Liner2 - a Generic Framework for Named Entity Recognition , 2017, BSNLP@EACL.

[26]  Timothy W. Finin,et al.  HLTCOE Participation at TAC 2013 , 2013, TAC.

[27]  Nikola Ljubešić,et al.  Combining available datasets for building named entity recognition models of Croatian and Slovene , 2013 .

[28]  Marcin Sydow,et al.  On knowledge-poor methods for person name matching and lemmatization for highly inflectional languages , 2009, Information Retrieval.

[29]  Erik F. Tjong Kim Sang,et al.  Introduction to the CoNLL-2003 Shared Task: Language-Independent Named Entity Recognition , 2003, CoNLL.