Evaluation of Finite State Morphological Analyzers Based on Paradigm Extraction from Wiktionary

Wiktionary provides lexical information for an increasing number of languages, including morphological inflection tables. It is a good resource for automatically learning rule-based analysis of the inflectional morphology of a language. This paper performs an extensive evaluation of a method to extract generalized paradigms from morphological inflection tables, which can be converted to weighted and unweighted finite transducers for morphological parsing and generation. The inflection tables of 55 languages from the English edition of Wiktionary are converted to such general paradigms, and the performance of the probabilistic parsers based on these paradigms are tested.

[1]  Markus Forsberg,et al.  Paradigm classification in supervised learning of morphology , 2015, HLT-NAACL.

[2]  Katharina Kann,et al.  Single-Model Encoder-Decoder with Explicit Morphological Representation for Reinflection , 2016, ACL.

[3]  John DeNero,et al.  Supervised Learning of Complete Morphological Paradigms , 2013, NAACL.

[4]  Kevin Duh,et al.  Automatic Learning of Language Model Structure , 2004, COLING.

[5]  Philipp Koehn,et al.  Enriching Morphologically Poor Languages for Statistical Machine Translation , 2008, ACL.

[6]  Mans Hulden,et al.  Boosting statistical tagger accuracy with simple rule-based grammars , 2012, LREC.

[7]  Mans Hulden Generalizing Inflection Tables into Paradigms with Finite State Operations , 2014, SIGMORPHON/SIGFSM.

[8]  Daniel Jurafsky,et al.  Morphological features help POS tagging of unknown words across language varieties , 2005, IJCNLP.

[9]  Christo Kirov,et al.  Very-large Scale Parsing and Normalization of Wiktionary Morphological Paradigms , 2016, LREC.

[10]  Markus Forsberg,et al.  Semi-supervised learning of morphological paradigms and lexicons , 2014, EACL.

[11]  Ryan Cotterell,et al.  The SIGMORPHON 2016 Shared Task—Morphological Reinflection , 2016, SIGMORPHON.

[12]  Christo Kirov,et al.  A Universal Feature Schema for Rich Morphological Annotation and Fine-Grained Cross-Lingual Part-of-Speech Tagging , 2015, SFCM.

[13]  Markus Forsberg,et al.  Learning Transducer Models for Morphological Analysis from Example Inflections , 2016 .