Modeling Syntactic Context Improves Morphological Segmentation

The connection between part-of-speech (POS) categories and morphological properties is well-documented in linguistics but underutilized in text processing systems. This paper proposes a novel model for morphological segmentation that is driven by this connection. Our model learns that words with common affixes are likely to be in the same syntactic category and uses learned syntactic categories to refine the segmentation boundaries of words. Our results demonstrate that incorporating POS categorization yields substantial performance gains on morphological segmentation of Arabic.

[1]  Mark Johnson,et al.  Unsupervised Word Segmentation for Sesotho Using Adaptor Grammars , 2008, SIGMORPHON.

[2]  Kristina Toutanova,et al.  A global model for joint lemmatization and part-of-speech prediction , 2009, ACL.

[3]  Vincent Ng,et al.  Unsupervised Part-of-Speech Acquisition for Resource-Scarce Languages , 2007, EMNLP-CoNLL.

[4]  Regina Barzilay,et al.  Simple Type-Level Unsupervised POS Tagging , 2010, EMNLP.

[5]  Regina Barzilay,et al.  Unsupervised Multilingual Learning for Morphological Segmentation , 2008, ACL.

[6]  Regina Barzilay,et al.  Cross-lingual Propagation for Morphological Analysis , 2008, AAAI.

[7]  Suresh Manandhar,et al.  Unsupervised Learning of Morphology by using Syntactic Categories , 2009, CLEF.

[8]  Mathias Creutz,et al.  Unsupervised models for morpheme segmentation and morphology learning , 2007, TSLP.

[9]  Dan Klein,et al.  Type-Based MCMC , 2010, HLT-NAACL.

[10]  Nizar Habash,et al.  Arabic Tokenization, Part-of-Speech Tagging and Morphological Disambiguation in One Fell Swoop , 2005, ACL.

[11]  Thomas L. Griffiths,et al.  Contextual Dependencies in Unsupervised Word Segmentation , 2006, ACL.

[12]  Michael Elhadad,et al.  An Unsupervised Morpheme-Based HMM for Hebrew Morphological Disambiguation , 2006, ACL.

[13]  Mark Johnson,et al.  A Bayesian LDA-based model for semi-supervised part-of-speech tagging , 2007, NIPS.

[14]  Hoifung Poon,et al.  Unsupervised Morphological Segmentation with Log-Linear Models , 2009, NAACL.