Effective Architecture of the Polish Tagger

The large tagset of the IPI PAN Corpus of Polish and the limited size of the learning corpus make construction of a tagger especially demanding The goal of this work is to decompose the overall process of tagging of Polish into subproblems of partial disambiguation Moreover, an architecture of a tagger facilitating this decomposition is proposed The proposed architecture enables easy integration of hand-written tagging rules with the rest of the tagger The architecture is open for different types of classifiers A complete tagger for Polish called TaKIPI is also presented Its configuration, the achieved results (92.55% of accuracy for all tokens, 84.75% for ambiguous tokens in ten-fold test), and considered variants of the architecture are discussed, too.