论文信息 - Building a Morphological Analyser for Laz

Building a Morphological Analyser for Laz

This study is an attempt to contribute to documentation and revitalization efforts of endangered Laz language, a member of South Caucasian language family mainly spoken on northeastern coastline of Turkey. It constitutes the first steps to create a general computational model for word form recognition and production for Laz by building a rule-based morphological analyser using Helsinki Finite-State Toolkit (HFST). The evaluation results show that the analyser has a 64.9% coverage over a corpus collected for this study with 111,365 tokens. We have also performed an error analysis on randomly selected 100 tokens from the corpus which are not covered by the analyser, and these results show that the errors mostly result from Turkish words in the corpus and missing stems in our lexicon.

Francis Tyers | Esra Onal

[1] Steven Bird. Last Words: Natural Language Processing and Linguistic Fieldwork , 2009, CL.

[2] Hammam Riza. Indigenous Languages of Indonesia: Creating Language Resources for Language Preservation , 2008, IJCNLP.

[3] Ciprian Gerstenberger,et al. Instant annotations in ELAN corpora of spoken and written Komi, an endangered language of the Barents Sea region , 2017 .

[4] Tommi A. Pirinen,et al. HFST - Framework for Compiling and Applying Morphologies , 2011, SFCM.

[5] René Lacroix. Description du dialecte laze d’Arhavi (caucasique du sud, Turquie) : grammaire et textes , 2009 .

[6] Damir Cavar,et al. Endangered Language Documentation: Bootstrapping a Chatino Speech Corpus, Forced Aligner, ASR , 2016, LREC.

[7] Nurdan Kavaklı. NOVUS ORTUS: THE AWAKENING OF LAZ LANGUAGE IN TURKEY , 2015 .

[8] Paul Meurer,et al. A Computational Grammar for Georgian , 2009, TbiLLC.

[9] J. V. Rauff,et al. Finite State Morphology , 2007 .