Stochastic Approaches to MorphologyAcquisition

One of the first steps in acquiring a morphology system is discovering which phonetic strings correspond to morphemes. These phonetic strings can then be further analyzed in order to determine their grammatical privileges and contribution to meaning and thus to bootstrap into a functional morphology system. Discovering the relevant phonetic strings is a deceptively easy task. Morpheme discovery presents a number of difficulties that are above and beyond those that occur for the similar task of word discovery and segmentation. Although both require the segmenting of a continuous speech stream, word segmentation can take advantage of the fact that some words are spoken in isolation, and those words can be used to bootstrap into the segmentation of other words. Although this will work for some morphemes (many words are monomorphemic), grammatical morphemes are often bound in many languages, such as English and Spanish, and thus never heard in isolation. Additionally, there is no simple strategy that will universally work for breaking a word into its component morphemes. Although in many languages grammatical morphemes are either at the beginning or the end of a word, simply using an approach whereby the child assumes that the first or last syllable is a morpheme will only work if that assumption aligns with the language environment that the child is exposed to. Since affixing languages of the world can have (multiple) prefixes, suffixes, and infixes, such an approach is likely to fail. Additionally, acquiring grammatical morphemes is much like acquiring function words; unlike nouns, function words have little concrete semantic meaning, likely contributing to the difficulty in learning these types of words (Bird et al. 2001, Caselli et al. 1995, Gentner 1982, Morrison et al. 1997). The search for morpheme forms does have the advantage that a given morpheme generally occurs within certain syntactic environments (e.g., the morpheme –ing in English generally occurs with verbs). Although it has been noted that morphology can help a child acquire syntax (Morgan et al. 1987), the reverse may also be true. The relationship between morphology and syntax could be beneficial both for discovering bound morphemes and for knowing which words a given bound morpheme can attach to. For instance, -ing might be more readily detected as a suffix when only examining verbs than when examining all words. Additionally, once a child has discovered that –ing can be applied to a particular verb, extending that ending only to other verbs will greatly reduce overgeneralization errors. There is a long history of research for morphology discovery models (e.g., Brent & Cartwright 1996, Goldsmith 2001, Harris 1955). Many of these systems, such as that by Erjavec and Džeroski (2004) are not designed to model child language acquisition, but rather are designed for computational tasks such as parsing a database. Because we are interested in how children acquire morphological forms, only models of language learning will be discussed here. In order to model acquisition of morphological forms by children, an automatic morphology discovery system must have the following characteristics. First, since morphemes must be acquired by the child (i.e., they are highly language specific and thus cannot be innate), any morphology discovery system must use a plausible learning mechanism. This entails not only using information available to the language learner, but also using mechanisms that children possess. Second, because morphemes can appear as (multiple) prefixes, suffixes, and infixes in affixing languages, any morpheme discovery system must have flexibility in terms of the position in the word where the morpheme occurs. Third, it must generate a robust list of morphemes which is minimally sufficient to allow the child to bootstrap into the rest of the morphological system. Finally, given that grammatical morphemes generally occur

[1]  Elissa L Newport,et al.  Structural packaging in the input to language learning: Contributions of prosodic and morphological marking of phrases to the acquisition of language , 1987, Cognitive Psychology.

[2]  P. Jusczyk,et al.  Phonotactic and Prosodic Effects on Word Segmentation in Infants , 1999, Cognitive Psychology.

[3]  John A. Goldsmith,et al.  Unsupervised Learning of the Morphology of a Natural Language , 2001, CL.

[4]  Toben H. Mintz Category induction from distributional cues in an artificial language , 2002, Memory & cognition.

[5]  Elizabeth Bates,et al.  A cross-linguistic study of early lexical development , 1995 .

[6]  Zellig S. Harris,et al.  From Phoneme to Morpheme , 1955 .

[7]  Marco Baroni,et al.  Distribution-driven morpheme discovery: a computational/experimental study , 2003 .

[8]  Dedre Gentner,et al.  Why Nouns Are Learned before Verbs: Linguistic Relativity Versus Natural Partitioning. Technical Report No. 257. , 1982 .

[9]  G. Marcus,et al.  Children's overregularization of English plurals: a quantitative analysis , 1995, Journal of Child Language.

[10]  Bruce Hayes,et al.  Modeling English Past Tense Intuitions with Minimal Generalization , 2002, SIGMORPHON.

[11]  D. Howard,et al.  Age of acquisition and imageability ratings for a large set of words, including verbs and function words , 2001, Behavior research methods, instruments, & computers : a journal of the Psychonomic Society, Inc.

[12]  R N Aslin,et al.  Statistical Learning by 8-Month-Old Infants , 1996, Science.

[13]  T. A. Cartwright,et al.  Distributional regularity and phonotactic constraints are useful for segmentation , 1996, Cognition.

[14]  C. Cazden The acquisition of noun and verb inflections. , 1968, Child development.

[15]  Sylvain Neuvel,et al.  Whole Word Morphologizer: Expanding the Word-Based Lexicon: A Nonstochastic Computational Approach , 2002, Brain and Language.

[16]  Daniel Jurafsky,et al.  Knowledge-Free Induction of Inflectional Morphologies , 2001, NAACL.

[17]  P. Jusczyk,et al.  Infants' sensitivity to phonotactic patterns in the native language. , 1994 .

[18]  Sean A. Fulop,et al.  Unsupervised Learning of Morphology Without Morphemes , 2002, SIGMORPHON.

[19]  Saso Dzeroski,et al.  DEPARTMENT OF INTELLIGENT SYSTEMS , 2019 .

[20]  Toben H. Mintz Frequent frames as a cue for grammatical categories in child directed speech , 2003, Cognition.

[21]  Andrew W. Ellis,et al.  Age of Acquisition Norms for a Large Set of Object Names and Their Relation to Adult Estimates and Other Variables , 1997 .

[22]  B. MacWhinney The CHILDES project: tools for analyzing talk , 1992 .

[23]  Susana López Ornat,et al.  La adquisición de la lengua española , 1994 .

[24]  V. Marchman,et al.  Overregularization in English plural and past tense inflectional morphology: a response to Marcus (1995) , 1997, Journal of Child Language.

[25]  Morten H. Christiansen,et al.  Learning to Segment Speech Using Multiple Cues: A Connectionist Model , 1998 .

[26]  Richard N Aslin,et al.  Statistical learning of new visual feature combinations by infants , 2002, Proceedings of the National Academy of Sciences of the United States of America.