Learning Probabilistic Paradigms for Morphology in a Latent Class Model

This paper introduces the probabilistic paradigm, a probabilistic, declarative model of morphological structure. We describe an algorithm that recursively applies Latent Dirichlet Allocation with an orthogonality constraint to discover morphological paradigms as the latent classes within a suffix-stem matrix. We apply the algorithm to data preprocessed in several different ways, and show that when suffixes are distinguished for part of speech and allomorphs or gender/conjugational variants are merged, the model is able to correctly learn morphological paradigms for English and Spanish. We compare our system with Linguistica (Goldsmith 2001), and discuss the advantages of the probabilistic paradigm over Linguistica's signature representation.

[1]  Alon Lavie,et al.  Unsupervised Induction of Natural Language Morphology Inflection Classes , 2004, SIGMORPHON@ACL.

[2]  Daniel Jurafsky,et al.  Knowledge-Free Induction of Inflectional Morphologies , 2001, NAACL.

[3]  Mark Johnson,et al.  Priors in Bayesian Learning of Phonological Rules , 2004, SIGMORPHON@ACL.

[4]  A. E. Albright,et al.  The identification of bases in morphological paradigms , 2002 .

[5]  R emi Zaja Morpholog: Constrained and Supervised Learning of Morphology , 2001 .

[6]  David Yarowsky,et al.  Minimally Supervised Morphological Analysis by Multimodal Alignment , 2000, ACL.

[7]  Yu Hu,et al.  Using Morphology and Syntax Together in Unsupervised Learning , 2005 .

[8]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[9]  John A. Goldsmith,et al.  Unsupervised Learning of the Morphology of a Natural Language , 2001, CL.

[10]  Xavier Carreras,et al.  FreeLing: An Open-Source Suite of Language Analyzers , 2004, LREC.

[11]  Rémi Eyraud,et al.  Proceedings of CoNLL , 2006 .

[12]  M. McShane,et al.  Bootstrapping Morphological Analyzers by Combining Human Elicitation and Machine Learning , 2001, Computational Linguistics.

[13]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[14]  Dayne Freitag,et al.  Morphology Induction from Term Clusters , 2005, CoNLL.

[15]  Markus Forsberg,et al.  Functional morphology , 2004, ICFP '04.

[16]  Gerald Gazdar,et al.  DATR: A Language for Lexical Knowledge Representation , 1996, CL.

[17]  Gaja Jarosz,et al.  Unsupervised Learning of Morphology Using a Novel Directed Search Algorithm: Taking the First Step , 2002, SIGMORPHON.

[18]  Suresh Manandhar,et al.  Unsupervised Learning of Word Segmentation Rules with Genetic Algorithms and Inductive Logic Programming , 2001, Machine Learning.

[19]  Alexander Clark,et al.  Learning Morphology with Pair Hidden Markov Models , 2001, ACL.