Unsupervised Word Segmentation in Context

This paper extends existing word segmentation models to take non-linguistic context into account. It improves the token F-score of a top performing segmentation models by 2.5% on a 27k utterances dataset. We posit that word segmentation is easier in-context because the learner is not trying to access irrelevant lexical items. We use topics from a Latent Dirichlet Allocation model as a proxy for “activities” contexts, to label the Providence corpus. We present Adaptor Grammar models that use these context labels, and we study their performance with and without context annotations at test time.

[1]  J. Pitman,et al.  Size-biased sampling of Poisson point processes and excursions , 1992 .

[2]  P. Jusczyk,et al.  Infants′ Detection of the Sound Patterns of Words in Fluent Speech , 1995, Cognitive Psychology.

[3]  T. Griffiths,et al.  A Bayesian framework for word segmentation: Exploring the effects of context , 2009, Cognition.

[4]  P. Jusczyk,et al.  Infants' preference for the predominant stress patterns of English words. , 1993, Child development.

[5]  Michael C. Frank,et al.  Relating Activity Contexts to Early Word Learning in Dense Longitudinal Data , 2012, CogSci.

[6]  Petr Sojka,et al.  Software Framework for Topic Modelling with Large Corpora , 2010 .

[7]  Yee Whye Teh,et al.  A Hierarchical Bayesian Language Model Based On Pitman-Yor Processes , 2006, ACL.

[8]  Mark Johnson,et al.  Improving nonparameteric Bayesian inference: experiments on unsupervised word segmentation with adaptor grammars , 2009, NAACL.

[9]  Michael C. Frank,et al.  Synergies in learning words and their referents , 2010, NIPS.

[10]  Michael C. Frank,et al.  Using Speakers' Referential Intentions to Model Early Cross-Situational Word Learning , 2009, Psychological science.

[11]  Thomas L. Griffiths,et al.  Adaptor Grammars: A Framework for Specifying Compositional Nonparametric Bayesian Models , 2006, NIPS.

[12]  Stefanie Tellex,et al.  The Human Speechome Project , 2006, EELC.

[13]  Mark Johnson,et al.  Exploring the Role of Stress in Bayesian Word Segmentation using Adaptor Grammars , 2014, TACL.

[14]  P. Jusczyk,et al.  Infants′ Sensitivity to the Sound Patterns of Native Language Words , 1993 .

[15]  Thomas L. Griffiths,et al.  Producing Power-Law Distributions and Damping Word Frequencies with Two-Stage Language Models , 2011, J. Mach. Learn. Res..

[16]  Jennifer Culbertson,et al.  Word-minimality, Epenthesis and Coda Licensing in the Early Acquisition of English , 2006, Language and speech.

[17]  Mark Johnson,et al.  Studying the Effect of Input Size for Bayesian Word Segmentation on the Providence Corpus , 2012, COLING.

[18]  Mark Johnson,et al.  Using Adaptor Grammars to Identify Synergies in the Unsupervised Acquisition of Linguistic Structure , 2008, ACL.

[19]  R N Aslin,et al.  Statistical Learning by 8-Month-Old Infants , 1996, Science.

[20]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..