Part-of-Speech Tagging

This chapter presents the application of ETL to language independent part-of-speech (POS) tagging. The POS tagging task consists in assigning a POS or another lexical class marker to each word in a text. We apply ETL and ETL Committee to four different corpora in three different languages: Portuguese, German and English. ETL system achieves state-of-the-art results for the four corpora. The ETL Committee strategy slightly improves the ETL accuracy for all corpora. This chapter is organized as follows. In Sect. 5.1, we describe the task and the selected corpora. In Sect. 5.2, we detail some modeling configurations used in our POS tagger system. In Sect. 5.3, we show some configurations used in the machine learning algorithms. Section 5.4 presents the application of ETL for the Mac-Morpho Corpus. In Sect. 5.5, we describe the application of ETL for the Tycho Brahe Corpus. Section 5.6 presents the application of ETL for the TIGER Corpus. In Sect. 5.7, we show the application of ETL for the Brown Corpus. Finally, Sect. 5.8 presents some concluding remarks.