Serbian NER&Beyond: The Archaic and the Modern Intertwinned

In this work, we present a Serbian literary corpus that is being developed under the umbrella of the “Distant Reading for European Literary History” COST Action CA16204. Using this corpus of novels written more than a century ago, we have developed and made publicly available a Named Entity Recognizer (NER) trained to recognize 7 different named entity types, with a Convolutional Neural Network (CNN) architecture, having F1 score of ≈91% on the test dataset. This model has been further assessed on a separate evaluation dataset. We wrap up with comparison of the developed model with the existing one, followed by a discussion of pros and cons of the both models.

[1]  Konstantinos I. Diamantaras,et al.  Design and implementation of an open source Greek POS Tagger and Entity Recognizer using spaCy , 2019, 2019 IEEE/WIC/ACM International Conference on Web Intelligence (WI).

[2]  S. Sekine,et al.  Overview of SHINRA2020-ML Task , 2022 .

[3]  Francesca Frontini,et al.  Named Entity Recognition for Distant Reading in ELTeC , 2020 .

[4]  Helmut Schmid,et al.  Improvements in Part-of-Speech Tagging with an Application to German , 1999 .

[5]  Denis Maurel,et al.  Enrichment of Renaissance texts with proper names , 2014 .

[6]  Cvetana Krstev,et al.  Development and Evaluation of Three Named Entity Recognition Systems for Serbian - The Case of Personal Names , 2019, RANLP.

[7]  David Bamman,et al.  An annotated dataset of literary entities , 2019, North American Chapter of the Association for Computational Linguistics.

[8]  Alexander Waibel,et al.  Incorporating External Annotation to improve Named Entity Translation in NMT , 2020, EAMT.

[9]  Cvetana Krstev,et al.  A system for named entity recognition based on local grammars , 2014, J. Log. Comput..

[10]  Cvetana Krstev,et al.  Machine Learning and Deep Neural Network-Based Lemmatization and Morphosyntactic Tagging for Serbian , 2020, LREC.

[11]  Marieke van Erp,et al.  Evaluating named entity recognition tools for extracting social networks from novels , 2019, PeerJ Comput. Sci..

[12]  Haizhou Li,et al.  Evaluating and Combining Name Entity Recognition Systems , 2016, NEWS@ACM.

[13]  Duško Vitas,et al.  PROCESSING OF CORPORA OF SERBIAN USING ELECTRONIC DICTIONARIES , 2012 .

[14]  Ali Jabbari,et al.  A French Corpus and Annotation Schema for Named Entity Recognition and Relation Extraction of Financial News , 2020, LREC.