This chapter presents the application of ETL to language independent named entity recognition (NER). The NER task consists of finding all proper nouns in a text and classifying them among several given categories of interest. We apply ETL and ETL Committee to three different corpora in three different languages: Portuguese, Spanish and Dutch. ETL system achieves state-of-the-art competitive results for the three corpora. Moreover, ETL Committee significantly improves the ETL results for the three corpora. This chapter is organized as follows. In Sect. 7.1, we describe the NER task and the selected corpora. In Sect. 7.2, we detail some modeling configurations used in our NER system. In Sect. 7.3, we show some configurations used in the machine learning algorithms. Section 7.4 presents the application of ETL for the HAREM Corpus. In Sect. 7.5, we present the application of ETL for the SPA CoNLL-2002. In Sect. 7.6, we detail the application of ETL for the DUT CoNLL-2002. Finally, Sect. 7.7 presents some concluding remarks.
[1]
Diana Santos,et al.
Reconhecimento de entidades mencionadas em português: Documentação e actas do HAREM, a primeira avaliação conjunta na área
,
2007
.
[2]
Erik F. Tjong Kim Sang,et al.
Introduction to the CoNLL-2003 Shared Task: Language-Independent Named Entity Recognition
,
2003,
CoNLL.
[3]
Xavier Carreras,et al.
Named Entity Extraction using AdaBoost
,
2002,
CoNLL.
[4]
Eric Brill,et al.
Transformation-Based Error-Driven Learning and Natural Language Processing: A Case Study in Part-of-Speech Tagging
,
1995,
CL.
[5]
Luís Sarmento,et al.
REPENTINO - A Wide-Scope Gazetteer for Entity Recognition in Portuguese
,
2006,
PROPOR.
[6]
Ruy Luiz Milidiú,et al.
Machine Learning Algorithms for Portuguese Named Entity Recognition
,
2007,
Inteligencia Artif..
[7]
Erik F. Tjong Kim Sang,et al.
Introduction to the CoNLL-2002 Shared Task: Language-Independent Named Entity Recognition
,
2002,
CoNLL.