Multiple alignments of inflectional paradigms

Most models of inflectional morphology rely at their core on the identification of recurrent and diverging material across inflected forms. Across theoretical frameworks, this can be expressed in terms of morpheme segmentation, rules, processes, patterns or analogies. Finding these recurrences in large structured lexicons is an important step in empirical computational morphology, where analyses are induced bottom-up from inflected forms. This can be done by aligning all the forms in each paradigm, a task of Multiple Sequence Alignments which is well known in other fields such as evolutionary biology and historical linguistics. In this paper, we present the specific problems which arise when aligning inflected forms, provide a simple alignment format, define evaluation measures and compare two implemented methods on 13 inflectional lexicons. Our intent is to provide the conditions for the interoperability of future systems, and for incremental improvements in this fundamental step for quantitative morphology.

[1]  John A. Goldsmith,et al.  Unsupervised Learning of the Morphology of a Natural Language , 2001, CL.

[2]  Ryan Cotterell,et al.  The SIGMORPHON 2016 Shared Task—Morphological Reinflection , 2016, SIGMORPHON.

[3]  R. Young,et al.  The Navajo Language: A Grammar and Colloquial Dictionary , 1943 .

[4]  Mikko Kurimo,et al.  Morpho Challenge 2005-2010: Evaluations and Results , 2010, SIGMORPHON.

[5]  Michael J. Fischer,et al.  The String-to-String Correction Problem , 1974, JACM.

[6]  Grzegorz Kondrak,et al.  A New Algorithm for the Alignment of Phonetic Sequences , 2000, ANLP.

[7]  Ryan Cotterell,et al.  CoNLL-SIGMORPHON 2017 Shared Task: Universal Morphological Reinflection in 52 Languages , 2017, CoNLL.

[8]  Michael A. Covington,et al.  An Algorithm to Align Words for Historical Comparison , 1996, Comput. Linguistics.

[9]  Enrique L. Palancar,et al.  Oto-Manguean Inflectional Class Database , 2015 .

[10]  Christus,et al.  A General Method Applicable to the Search for Similarities in the Amino Acid Sequence of Two Proteins , 2022 .

[11]  Kazutaka Kurisu,et al.  The Phonology of Morpheme Realization , 2002 .

[12]  Noam Chomsky,et al.  The Sound Pattern of English , 1968 .

[13]  Matías Guzmán Naranjo Analogy, complexity and predictability in the Russian nominal inflection system , 2020 .

[14]  Alfred V. Aho,et al.  Bounds on the Complexity of the Longest Common Subsequence Problem , 1976, J. ACM.

[15]  B. Hayes,et al.  Rules vs. analogy in English past tenses: a computational/experimental study , 2003, Cognition.

[16]  Mathias Creutz,et al.  Unsupervised Discovery of Morphemes , 2002, SIGMORPHON.

[17]  Markus Forsberg,et al.  Semi-supervised learning of morphological paradigms and lexicons , 2014, EACL.

[18]  Grzegorz Kondrak,et al.  Inflection Generation as Discriminative String Transduction , 2015, HLT-NAACL.

[19]  Grzegorz Kondrak,et al.  Phonetic Alignment and Similarity , 2003, Comput. Humanit..

[20]  Johann-Mattis List,et al.  Sequence comparison in historical linguistics , 2021 .

[21]  Joyce McDonough,et al.  When segmentation helps: Implicative structure and morph boundaries in the Navajo verb , 2017 .

[22]  Paul Boersma,et al.  Modeling Productivity with the Gradual Learning Algorithm: The Problem of Accidentally Exceptionless Generalizations , 2005 .

[23]  John DeNero,et al.  Supervised Learning of Complete Morphological Paradigms , 2013, NAACL.

[24]  R. Doolittle,et al.  Progressive sequence alignment as a prerequisitetto correct phylogenetic trees , 2007, Journal of Molecular Evolution.

[25]  Johann-Mattis List,et al.  SCA: Phonetic Alignment Based on Sound Classes , 2011, ESSLLI Student Sessions.

[26]  Olivier Bonami,et al.  Construction d'un lexique flexionnel phonétisé libre du français , 2014 .

[27]  F. Dell,et al.  Les règles et les sons : introduction à la phonologie générative , 1985 .

[28]  Olivier Bonami,et al.  De formes en thèmes , 2014 .

[29]  Johann-Mattis List,et al.  A Benchmark Database of Phonetic Alignments in Historical Linguistics and Dialectology , 2014, LREC.

[30]  C. F. Hockett Two Models of Grammatical Description , 1954 .

[31]  Stefan A. Frisch,et al.  Similarity and Frequency in Phonology , 1996 .

[32]  Sacha Beniamine Classification flexionnelles : Etude quantitative des structures de paradigmes , 2018 .

[33]  Mohammad Taghi Hajiaghayi,et al.  Approximating LCS in Linear Time: Beating the √n Barrier , 2019, SODA.

[34]  Fernando Perdigão,et al.  Generating a pronunciation dictionary for European Portuguese using a joint-sequence model with embedded stress assignment , 2013, Journal of the Brazilian Computer Society.

[35]  Daniel S. Hirschberg,et al.  Algorithms for the Longest Common Subsequence Problem , 1977, JACM.

[36]  Marco Passarotti,et al.  LatInfLexi: an Inflected Lexicon of Latin Verbs , 2018, CLiC-it.

[37]  Olivier Bonami,et al.  Inferring Inflection Classes with Description Length , 2018, J. Lang. Model..

[38]  Bruce Hayes,et al.  An Automated Learner for Phonology and Morphology , 1999 .

[39]  Gregory Stump,et al.  Position classes and morphological theory , 1993 .

[40]  Sacha Beniamine Un algorithme universel pour l'abstraction automatique d'alternances morphophonologiques , 2017 .

[41]  Bernhard Ganter,et al.  Formal Concept Analysis: Mathematical Foundations , 1998 .

[42]  Christo Kirov,et al.  Very-large Scale Parsing and Normalization of Wiktionary Morphological Paradigms , 2016, LREC.

[43]  M. Sternberg,et al.  A strategy for the rapid multiple alignment of protein sequences. Confidence levels from tertiary structure comparisons. , 1987, Journal of molecular biology.

[44]  Grzegorz Kondrak,et al.  Applying Many-to-Many Alignments and Hidden Markov Models to Letter-to-Phoneme Conversion , 2007, NAACL.

[45]  Sean R. Eddy,et al.  Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids , 1998 .

[46]  Morris Halle,et al.  Problem book in phonology : a workbook for introductory courses in linguistics and in modern phonology , 1983 .

[47]  Bruce Hayes,et al.  Modeling English Past Tense Intuitions with Minimal Generalization , 2002, SIGMORPHON.