Universal Dependencies for Amharic

In this paper, we describe the process of creating an Amharic Dependency Treebank, which is the first attempt to introduce Universal Dependencies (UD) into Amharic. Amharic is a morphologically-rich and less-resourced language within the Semitic language family. In Amharic, an orthographic word may be bundled with information other than morphology. There are some clitics attached to major lexical categories with grammatical functions. We first explain the segmentation of clitics, which is problematic to retrieve from the orthographic word due to morpheme co-occurrence restriction, assimilation and ambiguity of the clitics. Then, we describe the annotation processes for POS tagging, morphological information and dependency relations. Based on this, we have created a Treebank of 1,096 sentences.

[1]  Fredrik Olsson,et al.  Methods for Amharic Part-of-Speech Tagging , 2009 .

[2]  Daniel Zeman,et al.  Reusable Tagset Conversion Using Tagset Drivers , 2008, LREC.

[3]  Yoav Goldberg,et al.  Hebrew Dependency Parsing: Initial Results , 2009, IWPT.

[4]  Joakim Nivre,et al.  Towards a Universal Grammar for Natural Language Processing , 2015, CICLing.

[5]  Binyam Gebrekidan Gebre,et al.  Part of speech tagging for Amharic , 2010 .

[6]  Sampo Pyysalo,et al.  Universal Dependencies v1: A Multilingual Treebank Collection , 2016, LREC.

[7]  Slav Petrov,et al.  A Universal Part-of-Speech Tagset , 2011, LREC.

[8]  Pavel Rychlý,et al.  Annotated Amharic Corpora , 2016, TSD.

[9]  Christopher D. Manning,et al.  Stanford typed dependencies manual , 2010 .

[10]  Joakim Nivre,et al.  Universal Stanford dependencies: A cross-linguistic typology , 2014, LREC.

[11]  Sisay Fissaha Adafre Part of Speech Tagging for Amharic using Conditional Random Fields , 2005, SEMITIC@ACL.

[12]  Daniel Zeman,et al.  HamleDT: To Parse or Not to Parse? , 2012, LREC.

[13]  Solomon Teferra Abate,et al.  Part-of-Speech Tagging for Under-Resourced and Morphologically Rich Languages - The Case of Amharic , 2011 .

[14]  Yusuke Miyao,et al.  Morpho-syntactically Annotated Amharic Treebank , 2016, CLiF.

[15]  Björn Gambäck,et al.  Tagging and Verifying an Amharic News Corpus , 2012 .

[16]  Yuji Matsumoto,et al.  Universal Dependencies 2.1 , 2017 .

[17]  Desalegn Asfawwesen,et al.  The inceptive construction and associated topics in Amharic and related languages , 2016 .