Redefinition of Turkish Morphology Using Flag Diacritics

This paper primarily discusses how to model Turkish morphotactics using flag diacritics. We present a two-level Turkish morphological analyzer based on a lexicon of word lemmata with over 49321 entries, as well as an auxiliary unknown word analyzer. Our main analyzer demonstrates the use of flag diacritics for Turkish, which is to date not a well-researched approach for the language. Turkish is an agglutinative language with many exceptions to phonetic and morphological rules, and flag diacritics are useful in handling these exceptions. Our unknown word analyzer operates without an extra lexicon, using affix stripping to find word lemmata by recursively removing affixes. We use the described methodology to find all possible lemmata which are not in our lexicon.