The SIGMORPHON 2016 Shared Task—Morphological Reinflection

The 2016 SIGMORPHON Shared Task was devoted to the problem of morphological reinflection. It introduced morphological datasets for 10 languages with diverse typological characteristics. The shared task drew submissions from 9 teams representing 11 institutions reflecting a variety of approaches to addressing supervised learning of reinflection. For the simplest task, inflection generation from lemmas, the best system averaged 95.56% exact-match accuracy across all languages, ranging from Maltese (88.99%) to Hungarian (99.30%). With the relatively large training datasets provided, recurrent neural network architectures consistently performed best—in fact, there was a significant margin between neural and non-neural approaches. The best neural approach, averaged over all tasks and languages, outperformed the best nonneural one by 13.76% absolute; on individual tasks and languages the gap in accuracy sometimes exceeded 60%. Overall, the results show a strong state of the art, and serve as encouragement for future shared tasks that explore morphological analysis and generation with varying degrees of supervision.

[1]  Grzegorz Kondrak,et al.  Applying Many-to-Many Alignments and Hidden Markov Models to Letter-to-Phoneme Conversion , 2007, NAACL.

[2]  Alexander M. Fraser,et al.  Joint Lemmatization and Morphological Tagging with Lemming , 2015, EMNLP.

[3]  Yoshua Bengio,et al.  On the Properties of Neural Machine Translation: Encoder–Decoder Approaches , 2014, SSST@EMNLP.

[4]  Sharon Goldwater,et al.  Improving Statistical MT through Morphological Analysis , 2005, HLT.

[5]  Markus Dreyer,et al.  Latent-Variable Modeling of String Transductions with Finite-State Methods , 2008, EMNLP.

[6]  Noah A. Smith,et al.  Transition-Based Dependency Parsing with Stack Long Short-Term Memory , 2015, ACL.

[7]  Nizar Habash,et al.  Automatic Extraction of Morphological Lexicons from Morphologically Annotated Corpora , 2013, EMNLP.

[8]  Ryan Cotterell,et al.  Modeling Word Forms Using Latent Underlying Morphs and Phonology , 2015, TACL.

[9]  Keren Rice,et al.  Morpheme Order and Semantic Scope: Word Formation in the Athapaskan Verb , 2000 .

[10]  R. Young,et al.  The Navajo Language: A Grammar and Colloquial Dictionary , 1943 .

[11]  Robert Östling,et al.  Morphological reinflection with convolutional neural networks , 2016, SIGMORPHON.

[12]  Alexey Sorokin Using longest common subsequence and character models to predict word forms , 2016, SIGMORPHON.

[13]  Katsuhito Sudoh,et al.  Noise-Aware Character Alignment for Bootstrapping Statistical Machine Transliteration from Bilingual Corpora , 2013, EMNLP.

[14]  Christo Kirov,et al.  A Universal Feature Schema for Rich Morphological Annotation and Fine-Grained Cross-Lingual Part-of-Speech Tagging , 2015, SFCM.

[15]  Gunnar Ólafur Hansson Consonant Harmony: Long-Distance Interaction in Phonology , 2010 .

[16]  Keikichi Hirose,et al.  WFST-Based Grapheme-to-Phoneme Conversion: Open Source tools for Alignment, Model-Building and Decoding , 2012, FSMNLP.

[17]  Christo Kirov,et al.  Very-large Scale Parsing and Normalization of Wiktionary Morphological Paradigms , 2016, LREC.

[18]  Ling Liu,et al.  Morphological reinflection with conditional random fields and unsupervised features , 2016, SIGMORPHON.

[19]  Markus Dreyer,et al.  Graphical Models over Multiple Strings , 2009, EMNLP.

[20]  Yonatan Belinkov,et al.  Improving Sequence to Sequence Learning for Morphological Inflection Generation: The BIU-MIT Systems for the SIGMORPHON 2016 Shared Task for Morphological Reinflection , 2016, SIGMORPHON.

[21]  Yulia Tsvetkov,et al.  Morphological Inflection Generation Using Character Sequence to Sequence Learning , 2015, NAACL.

[22]  Philip Koehn,et al.  Statistical Machine Translation , 2010, EAMT.

[23]  Eliyahu Kiperwasser,et al.  Simple and Accurate Dependency Parsing Using Bidirectional LSTM Feature Representations , 2016, TACL.

[24]  Markus Forsberg,et al.  Paradigm classification in supervised learning of morphology , 2015, HLT-NAACL.

[25]  Johan Schalkwyk,et al.  OpenFst: A General and Efficient Weighted Finite-State Transducer Library , 2007, CIAA.

[26]  Markus Forsberg,et al.  Semi-supervised learning of morphological paradigms and lexicons , 2014, EACL.

[27]  Ryan Cotterell,et al.  Stochastic Contextual Edit Distance and Probabilistic FSTs , 2014, ACL.

[28]  William J. Poser,et al.  Blocking of Phrasal Constructions by Lexical Items , 2007 .

[29]  Grzegorz Kondrak,et al.  Morphological Reinflection via Discriminative String Transduction , 2016, SIGMORPHON.

[30]  Kevin Duh,et al.  Automatic Learning of Language Model Structure , 2004, COLING.

[31]  Grzegorz Kondrak,et al.  Joint Processing and Discriminative Training for Letter-to-Phoneme Conversion , 2008, ACL.

[32]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[33]  Josef van Genabith,et al.  Learning Morphology with Morfette , 2008, LREC.

[34]  Ryan Cotterell,et al.  Weighting Finite-State Transductions With Neural Context , 2016, NAACL.

[35]  J. V. Rauff,et al.  Finite State Morphology , 2007 .

[36]  Markus Dreyer,et al.  Discovering Morphological Paradigms from Plain Text Using a Dirichlet Process Mixture Model , 2011, EMNLP.

[37]  John J. Camilleri,et al.  A Computational Grammar and Lexicon for Maltese , 2013 .

[38]  Nizar Habash,et al.  The Columbia University - New York University Abu Dhabi SIGMORPHON 2016 Morphological Reinflection Shared Task Submission , 2016, SIGMORPHON.

[39]  Iñaki Alegria,et al.  EHU at the SIGMORPHON 2016 Shared Task. A Simple Proposal: Grapheme-to-Phoneme for Inflection , 2016, SIGMORPHON.

[40]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[41]  John DeNero,et al.  Supervised Learning of Complete Morphological Paradigms , 2013, NAACL.

[42]  M. Mithun The evolution of noun incorporation , 1984 .

[43]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[44]  Katharina Kann,et al.  MED: The LMU System for the SIGMORPHON 2016 Shared Task on Morphological Reinflection , 2016, SIGMORPHON.

[45]  Fred Karlsson Finnish: An Essential Grammar , 1999 .

[46]  Howard I. Aronson Georgian: A Reading Grammar , 1984 .

[47]  Yoav Freund,et al.  Large Margin Classification Using the Perceptron Algorithm , 1998, COLT.

[48]  Dan Gusfield Algorithms on Strings, Trees, and Sequences - Computer Science and Computational Biology , 1997 .

[49]  Grzegorz Kondrak,et al.  Inflection Generation as Discriminative String Transduction , 2015, HLT-NAACL.

[50]  Christo Kirov,et al.  A Language-Independent Feature Schema for Inflectional Morphology , 2015, ACL.

[51]  Fernando Pereira,et al.  Weighted finite-state transducers in speech recognition , 2002, Comput. Speech Lang..

[52]  William W. Cohen,et al.  Semi-Markov Conditional Random Fields for Information Extraction , 2004, NIPS.

[53]  Daniel S. Hirschberg,et al.  A linear space algorithm for computing maximal common subsequences , 1975, Commun. ACM.

[54]  David King Evaluating Sequence Alignment for Learning Inflectional Morphology , 2016, SIGMORPHON.