Part of Speech Induction from Distributional Features: Balancing Vocabulary and Context

Past research on grammar induction has found promising results in predicting parts-of-speech from n-grams using a fixed vocabulary and a fixed context. In this study, we investigated grammar induction whereby we varied vocabulary size and context size. Results indicated that as context increased for a fixed vocabulary, overall accuracy initially increased but then leveled off. Importantly, this increase in accuracy did not occur at the same rate across all syntactic categories. We also address the dynamic relation between context and vocabulary in terms of grammar induction in an unsupervised methodology. We formulate a model that represents a relationship between vocabulary and context for grammar induction. Our results concur with what has been called the word spurt phenomenon in the child language acquisition literature.

[1]  Hinrich Schütze,et al.  Distributional Part-of-Speech Tagging , 1995, EACL.

[2]  J. Bertoncini,et al.  Before and after the vocabulary spurt: two modes of word acquisition? , 2003 .

[3]  L. Gerken,et al.  Infants can use distributional cues to form syntactic categories , 2005, Journal of Child Language.

[4]  Roger Garside,et al.  A hybrid grammatical tagger: CLAWS4 , 1997 .

[5]  J. Holt How Children Learn , 1967 .

[6]  Bob McMurray,et al.  A stochastic model for the vocabulary explosion , 2008 .

[7]  Hinrich Schütze,et al.  Introduction to information retrieval , 2008 .

[8]  B. MacWhinney The CHILDES project: tools for analyzing talk , 1992 .

[9]  Eric Brill,et al.  Transformation-Based Error-Driven Learning and Natural Language Processing: A Case Study in Part-of-Speech Tagging , 1995, CL.

[10]  Nick Chater,et al.  DISTRIBUTIONAL INFORMATION AND THE ACQUISITION OF LINGUISTIC CATEGORIES - A STATISTICAL APPROACH , 1993 .

[11]  A. Gopnik,et al.  The Development of Categorization in the Second Year and Its Relation to Other Cognitive and Linguistic Developments. , 1987 .

[12]  Bob McMurray,et al.  Defusing the Childhood Vocabulary Explosion , 2007, Science.

[13]  P. Bloom How Children Learn the Meaning of Words and How LSA Does It ( Too ) , 2005 .

[14]  John T. Bruer,et al.  How Children Learn. , 1994 .

[15]  P.J.M. de Haan,et al.  Tagging non-native English with the TOSCA-ICLE tagger , 2000, Corpus Linguistics and Linguistic Theory.

[16]  Bertus van Rooy,et al.  An evaluation of three POS taggers for the tagging of the Tswana Learner English Corpus , 2003 .

[17]  Nick Chater,et al.  Distributional Information: A Powerful Cue for Acquiring Syntactic Categories , 1998, Cogn. Sci..