Automatic Extraction of Morphological Lexicons from Morphologically Annotated Corpora

We present a method for automatically learning inflectional classes and associated lemmas from morphologically annotated corpora. The method consists of a core languageindependent algorithm, which can be optimized for specific languages. The method is demonstrated on Egyptian Arabic and German, two morphologically rich languages. Our best method for Egyptian Arabic provides an error reduction of 55.6% over a simple baseline; our best method for German achieves a 66.7% error reduction.

[1]  Gaja Jarosz,et al.  Unsupervised Learning of Morphology Using a Novel Directed Search Algorithm: Taking the First Step , 2002, SIGMORPHON.

[2]  Nizar Habash,et al.  50th Annual Meeting of the Association for Computational Linguistics Proceedings of the Conference Volume 2: Short Papers , 2012 .

[3]  Nizar Habash,et al.  On Arabic Transliteration , 2007 .

[4]  Kimmo Koskenniemi,et al.  Two-Level Model for Morphological Analysis , 1983, IJCAI.

[5]  Sean A. Fulop,et al.  Unsupervised Learning of Morphology Without Morphemes , 2002, SIGMORPHON.

[6]  Suresh Manandhar,et al.  Probabilistic Hierarchical Clustering of Morphological Paradigms , 2012, EACL.

[7]  Gregory Stump,et al.  Inflectional Morphology: Conclusions, extensions, and alternatives , 2001 .

[8]  Lars Borin,et al.  Unsupervised Learning of Morphology , 2011, CL.

[9]  Nizar Habash,et al.  MAGEAD: A Morphological Analyzer and Generator for the Arabic Dialects , 2006, ACL.

[10]  Nizar Habash,et al.  Morphological Analysis and Generation for Arabic Dialects , 2005, SEMITIC@ACL.

[11]  Mathias Creutz,et al.  Unsupervised models for morpheme segmentation and morphology learning , 2007, TSLP.

[12]  Eneko Agirre,et al.  Advances in Multilingual and Multimodal Information Retrieval. , 2008 .

[13]  John DeNero,et al.  Supervised Learning of Complete Morphological Paradigms , 2013, NAACL.

[14]  Regina Barzilay,et al.  Unsupervised Multilingual Learning for Morphological Segmentation , 2008, ACL.

[15]  Nizar Habash,et al.  Automatic Correction and Extension of Morphological Annotations , 2013, LAW@ACL.

[16]  Maris Camilleri Island morphology : morphology's interactions in the study of stem patterns , 2011 .

[17]  Géraldine Walther Measuring morphological canonicity , 2011 .

[18]  David Yarowsky,et al.  Minimally Supervised Morphological Analysis by Multimodal Alignment , 2000, ACL.

[19]  Erwin Chan,et al.  Learning Probabilistic Paradigms for Morphology in a Latent Class Model , 2006, SIGMORPHON.

[20]  Günter Neumann,et al.  Arabic Computational Morphology: Knowledge-based and Empirical Methods , 2007 .

[21]  Aarne Ranta,et al.  Smart Paradigms and the Predictability and Complexity of Inflectional Morphology , 2012, EACL.

[22]  Alon Lavie,et al.  ParaMor: Finding Paradigms across Morphology , 2008, CLEF.

[23]  Markus Forsberg,et al.  Morphological Lexicon Extraction from Raw Text Data , 2006, FinTAL.

[24]  Markus Dreyer,et al.  Discovering Morphological Paradigms from Plain Text Using a Dirichlet Process Mixture Model , 2011, EMNLP.

[25]  Raphael A. Finkel,et al.  Generating Hebrew Verb Morphology by Default Inheritance Hierarchies , 2002, SEMITIC@ACL.

[26]  David Yarowsky,et al.  Bootstrapping a Multilingual Part-of-speech Tagger in One Person-day , 2002, CoNLL.

[27]  Nizar Habash,et al.  Introduction to Arabic Natural Language Processing , 2010, Introduction to Arabic Natural Language Processing.

[28]  Benoît Sagot,et al.  Morphology Based Automatic Acquisition of Large-coverage Lexica , 2004, LREC.

[29]  Nizar Habash,et al.  Techniques for Arabic morphological detokenization and orthographic denormalization , 2010 .

[30]  Silvia Hansen,et al.  Developments in the TIGER Annotation Scheme and their Realization in the Corpus , 2002, LREC.