Finite State Transducer Calculus for Whole Word Morphology

The research on machine learning of morphology often involves formulating morphological descriptions directly on surface forms of words. As the established two-level morphology paradigm requires the knowledge of the underlying structure, it is not widely used in such settings. In this paper, we propose a formalism describing structural relationships between words based on theories of morphology that reject the notions of internal word structure and morpheme. The formalism covers a wide variety of morphological phenomena (including non-concatenative ones like stem vowel alternation) without the need of workarounds and extensions. Furthermore, we show that morphological rules formulated in such way can be easily translated to FSTs, which enables us to derive performant approaches to morphological analysis, generation and automatic rule discovery.

[1]  Christian Simon,et al.  Morphisto - An Open Source Morphological Analyzer for German , 2009, FSMNLP.

[2]  Sylvain Neuvel,et al.  Vive la différence!* What morphology is about , 2001 .

[3]  Maciej Janicki A Multi-purpose Bayesian Model for Word-Based Morphology , 2015, SFCM.

[4]  Phil Blunsom,et al.  Adaptor Grammars for Learning Non-Concatenative Morphology , 2013, EMNLP.

[5]  Krister Lindén,et al.  Entry Generation by Analogy – Encoding New Words for Morphological Lexicons , 2009 .

[6]  Krister Lindén,et al.  HFST runtime format: A compacted transducer format allowing for fast lookup , 2009 .

[7]  Regina Barzilay,et al.  An Unsupervised Method for Uncovering Morphological Chains , 2015, TACL.

[8]  R. Ewy,et al.  ABSTRACT , 1986 .

[9]  Markus Forsberg,et al.  Semi-supervised learning of morphological paradigms and lexicons , 2014, EACL.

[10]  Regina Barzilay,et al.  Unsupervised Learning of Morphological Forests , 2017, Transactions of the Association for Computational Linguistics.

[11]  Krister Lindén,et al.  A Probabilistic Model for Guessing Base Forms of New Words by Analogy , 2008, CICLing.

[12]  Ian Cloete,et al.  Automatic Acquisition of Two-Level Morphological Rules , 1997, ANLP.

[13]  Tommi A. Pirinen,et al.  HFST - Framework for Compiling and Applying Morphologies , 2011, SFCM.

[14]  John DeNero,et al.  Supervised Learning of Complete Morphological Paradigms , 2013, NAACL.

[15]  Sean A. Fulop,et al.  Unsupervised Learning of Morphology Without Morphemes , 2002, SIGMORPHON.

[16]  Çağrı Çöltekin,et al.  A Freely Available Morphological Analyzer for Turkish , 2010, LREC.

[17]  Radu Soricut,et al.  Unsupervised Morphology Induction Using Word Embeddings , 2015, NAACL.

[18]  Kimmo Koskenniemi,et al.  A General Computational Model for Word-Form Recognition and Production , 1984 .

[19]  David Yarowsky,et al.  Modeling and learning multilingual inflectional morphology in a minimally supervised framework , 2003 .

[20]  Michael J. Fischer,et al.  The String-to-String Correction Problem , 1974, JACM.

[21]  Cyril Allauzen,et al.  3-Way Composition of Weighted Finite-State Transducers , 2008, CIAA.

[22]  Maciej Sumalvico Unsupervised Learning of Morphology with Graph Sampling , 2017, RANLP.

[23]  Vladimir I. Levenshtein,et al.  Binary codes capable of correcting deletions, insertions, and reversals , 1965 .