论文信息 - Redefinition of Turkish Morphology Using Flag Diacritics

Redefinition of Turkish Morphology Using Flag Diacritics

This paper primarily discusses how to model Turkish morphotactics using flag diacritics. We present a two-level Turkish morphological analyzer based on a lexicon of word lemmata with over 49321 entries, as well as an auxiliary unknown word analyzer. Our main analyzer demonstrates the use of flag diacritics for Turkish, which is to date not a well-researched approach for the language. Turkish is an agglutinative language with many exceptions to phonetic and morphological rules, and flag diacritics are useful in handling these exceptions. Our unknown word analyzer operates without an extra lexicon, using affix stripping to find word lemmata by recursively removing affixes. We use the described methodology to find all possible lemmata which are not in our lexicon.

Gülşen Eryiğit | Umut Sulubacak | Muhammet Şahin

[1] Murat Saraclar,et al. Turkish Language Resources: Morphological Parser, Morphological Disambiguator and Web Corpus , 2008, GoTAL.

[2] Slav Petrov,et al. A Universal Part-of-Speech Tagset , 2011, LREC.

[3] Çağrı Çöltekin,et al. A Freely Available Morphological Analyzer for Turkish , 2010, LREC.

[4] Mohammed A. Attia. An Ambiguity-Controlled Morphological Analyzer for Modern Standard Arabic Modeling Finite State Networks , 2006, BCS.

[5] Kemal Oflazer,et al. Two-level Description of Turkish Morphology , 1993, EACL.

[6] Ruli Manurung,et al. A Two-Level Morphological Analyser for the Indonesian Language , 2008, ALTA.

[7] Eşref Adalı,et al. AN AFFIX STRIPPING MORPHOLOGICAL ANALYZER FOR TURKISH , 2003 .