HFST-SweNER — A New NER Resource for Swedish

Named entity recognition (NER) is a knowledge-intensive information extraction task that is used for recognizing textual mentions of entities that belong to a predefined set of categories, such as locations, organizations and time expressions. NER is a challenging, difficult, yet essential preprocessing technology for many natural language processing applications, and particularly crucial for language understanding. NER has been actively explored in academia and in industry especially during the last years due to the advent of social media data. This paper describes the conversion, modeling and adaptation of a Swedish NER system from a hybrid environment, with integrated functionality from various processing components, to the Helsinki Finite-State Transducer Technology (HFST) platform. This new HFST-based NER (HFST-SweNER) is a full-fledged open source implementation that supports a variety of generic named entity types and consists of multiple, reusable resource layers, e.g., various n-gram-based named entity lists (gazetteers).

[1]  Yefeng Wang,et al.  Annotating and Recognising Named Entities in Clinical Notes , 2009, ACL.

[2]  Marc Dymetman,et al.  Hybrid Adaptation of Named Entity Recognition for Statistical Machine Translation , 2012, AML@COLING.

[3]  Marc Moens,et al.  Named Entity Recognition without Gazetteers , 1999, EACL.

[4]  Lars Borin,et al.  Literary Onomastics and Language Technology , 2010 .

[5]  Satoshi Sekine,et al.  Definition, Dictionaries and Tagger for Extended Named Entity Hierarchy , 2004, LREC.

[6]  Håkan Jonsson,et al.  Named Entity Recognition for Short Text Messages , 2011 .

[7]  Erik F. Tjong Kim Sang,et al.  Introduction to the CoNLL-2003 Shared Task: Language-Independent Named Entity Recognition , 2003, CoNLL.

[8]  R. Cooper Design Challenges , 2007 .

[9]  Carol Friedman,et al.  Introduction: named entity recognition in biomedicine , 2004, J. Biomed. Informatics.

[10]  Pierre Nugues,et al.  Identification of Entities in Swedish , 2012 .

[11]  Rabiah Ahmad,et al.  Communications in Computer and Information Science , 2010 .

[12]  James Allan,et al.  Text classification and named entities for new event detection , 2004, SIGIR '04.

[13]  Lauri Karttunen Beyond Morphology: Pattern Matching with FST , 2011, SFCM.

[14]  Willie van Peer,et al.  Literary Education and Digital Learning: Methods and Technologies for Humanities Studies , 2010 .

[15]  Eckhard Bick,et al.  Named Entity Recognition for the Mainland Scandinavian Languages , 2005, Lit. Linguistic Comput..

[16]  Eduard H. Hovy,et al.  Fine Grained Classification of Named Entities , 2002, COLING.

[17]  Pacific Association for Computational Linguistics ( PACLING 2011 ) Named Entity Recognition for Short Text Messages , 2013 .

[18]  Satoshi Sekine,et al.  Named entities : recognition, classification and use , 2009 .

[19]  Tommi A. Pirinen,et al.  HFST - Framework for Compiling and Applying Morphologies , 2011, SFCM.

[20]  Dan Roth,et al.  Design Challenges and Misconceptions in Named Entity Recognition , 2009, CoNLL.

[21]  Dimitrios Kokkinakis,et al.  AVENTINUS, GATE and Swedish Lingware , 1998, NODALIDA.

[22]  Tommi A. Pirinen,et al.  HFST - A System for Creating NLP Tools , 2013, SFCM.

[23]  Gerhard Weikum,et al.  LEILA: Learning to Extract Information by Linguistic Analysis , 2006, OntologyLearning@COLING/ACL.

[24]  Kathleen McKeown,et al.  Extracting Social Networks from Literary Fiction , 2010, ACL.

[25]  Mónica Marrero,et al.  Named Entity Recognition: Fallacies, challenges and opportunities , 2013, Comput. Stand. Interfaces.

[26]  Ralph Grishman,et al.  Message Understanding Conference- 6: A Brief History , 1996, COLING.

[27]  Steve Cassidy,et al.  Named Entity Recognition in Question Answering of Speech Data , 2007, ALTA.

[28]  Ravikumar Kondadadi,et al.  Named Entity Recognition and Resolution in Legal Text , 2010, Semantic Processing of Legal Texts.

[29]  Claudio Giuliano,et al.  Instance-Based Ontology Population Exploiting Named-Entity Substitution , 2008, COLING.