Morfessor 2.0: Toolkit for statistical morphological segmentation

Morfessor is a family of probabilistic machine learning methods for finding the morphological segmentation from raw text data. Recent developments include the development of semi-supervised methods for utilizing annotated data. Morfessor 2.0 is a rewrite of the original, widely-used Morfessor 1.0 software, with well documented command-line tools and library interface. It includes new features such as semi-supervised learning, online training, and integrated evaluation code.

[1]  Hermann Ney,et al.  Sub-lexical language models for German LVCSR , 2010, 2010 IEEE Spoken Language Technology Workshop.

[2]  Mikko Kurimo,et al.  Morfessor 2.0: Python Implementation and Extensions for Morfessor Baseline , 2013 .

[3]  Mikko Kurimo,et al.  Empirical Comparison of Evaluation Methods for Unsupervised Learning of Morphology , 2011, TAL.

[4]  Ebru Arisoy,et al.  Turkish Broadcast News Transcription and Retrieval , 2009, IEEE Transactions on Audio, Speech, and Language Processing.

[5]  Mikko Kurimo,et al.  Unlimited vocabulary speech recognition with morph language models applied to Finnish , 2006, Comput. Speech Lang..

[6]  Mathias Creutz,et al.  INDUCING THE MORPHOLOGICAL LEXICON OF A NATURAL LANGUAGE FROM UNANNOTATED TEXT , 2005 .

[7]  Oskar Kohonen,et al.  Unsupervised Morpheme Analysis with Allomorfessor , 2009, CLEF.

[8]  Peter A. Flach,et al.  Learning the morphology of Zulu with different degrees of supervision , 2008, 2008 IEEE Spoken Language Technology Workshop.

[9]  Oskar Kohonen,et al.  Evaluating the effect of word frequencies in a probabilistic generative model of morphology , 2011, NODALIDA.

[10]  Oskar Kohonen,et al.  Semi-Supervised Learning of Concatenative Morphology , 2010, SIGMORPHON.

[11]  Mathias Creutz,et al.  Unsupervised Morpheme Segmentation and Morphology Induction from Text Corpora Using Morfessor 1.0 , 2005 .

[12]  Mathias Creutz,et al.  Unsupervised Discovery of Morphemes , 2002, SIGMORPHON.

[13]  Brian Roark,et al.  Simulating Morphological Analyzers with Stochastic Taggers for Confidence Estimation , 2009, CLEF.

[14]  Mathias Creutz,et al.  Morphology-aware statistical machine translation based on morphs induced in an unsupervised manner , 2007, MTSUMMIT.

[15]  Mikko Kurimo,et al.  Importance of High-Order N-Gram Models in Morph-Based Speech Recognition , 2009, IEEE Transactions on Audio, Speech, and Language Processing.

[16]  Sami Virpioja,et al.  Learning Constructions of Natural Language: Statistical Models and Evaluations , 2012 .

[17]  Mathias Creutz,et al.  Unsupervised models for morpheme segmentation and morphology learning , 2007, TSLP.